The advance of the Human Microbiome Project provides the unprecedented opportunities for exploring the critical roles played by commensal microbiota in human health, immune maintenance, and disease. Massive megagenomic sequencing data have been produced in this blooming research area. Unique features, including extremely large dimensionality, complex correlation, zero-inflation, and compositional nature, of the produced data pose a huge challenge for analysis in terms of both methodology and computation, and render many existing statistical approaches inapplicable. Ignoring or inappropriately handling these features likely leads to distorted medical conclusions. Unfortunately, few formal analysis tools are available to address these challenges, mainly data transformation and dimension reduction methods (typically, PCA) in mediation analysis that lack of direct interpretability of the results; and penalized variable selection methods that are incapable of handling longitudinal response variables, and high-dimensional functional and compositional covariates. This proposal is devoted to developing a new set of statistically systematic and computationally efficient methods for utilizing complex and high-throughput microbial taxa measurements to explore the associations with treatment-related infection in disease.
The specific aims are: (1) Developing a clustering mediation model system to study the mediating effects of microbiota on chemotherapy in terms of the association with infections in AML; (2) Performing variable selection and covariance estimation for longitudinal microbial alpha-diversity in varying-coefficient models with high-dimensional and compositional taxa measurements as the covariates in AML; (3) Identifying important microbial taxa, which have two unique features---functional (measured over the time) and compositional (relative abundance), to be associated and predictive of chemotherapy-related infection in AML. Testing and validating the proposed analytical tools, and software development are two accompanying secondary aims. Mediation analysis, clustering, functional varying-coefficient models, functional regression models, data adaptive regularization for model selection, standard and non-standard theory for statistical tests are among the major statistical components in the proposal.

Public Health Relevance

Successful completion of the proposed research will lead to efficient and effective statistical models and methods for exploring associations between disease, treatment infection, and complex high-dimensional microbiome data, which can potentially facilitate discovery of new targeted medical regiment via modifying microbiome profile for disease. The proposed tools are urgently needed to extract important information from these rich microbiome data towards making sound scientific discovery in biomedical, clinical, and public health research.

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Research Project (R01)
Project #
1R01AI143886-01A1
Application #
9885773
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brown, Liliana L
Project Start
2019-09-23
Project End
2024-08-31
Budget Start
2019-09-23
Budget End
2020-08-31
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Columbia University (N.Y.)
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
621889815
City
New York
State
NY
Country
United States
Zip Code
10032