The broad, long-term objective of this project concerns the development of novel statistical methods and computational tools for statistical and probabilistic modeling of human microbiome and shotgun metagenomic data motivated by important biological questions and experiments.
The specific aim of the current project is to develop new statistical models, novel inference procedures, and fast computational algorithms for the analysis of 16S rRNA and shotgun metagenomic sequencing data in large-scale human microbiome studies. The project focuses on the development of model-based multi-sample approaches for quantifying microbiome compositions and development methods of compositional mediation analysis in order to quantify the effects of microbiome mediating the effect of treatment/risk factor on outcomes. In addition, this project will also develop novel methods for statistical inference including large-scale multiple testing procedures on sparse discrete Markov random field (MRF) models for microbial interaction network construction and for differential network analysis. These problems are all motivated by the PI's close collaborations with Penn investigators on metagenomic studies of Crohn disease, childhood obesity and disease progression among patients with chronic kidney disease (CKD)). The methods hinge on novel integration of biological insights and methods for modeling sparse count data, high dimensional compositional data analysis and network-based analysis, including nuclear-norm penalized maximum likelihood estimation for tax abundance estimation, compositional mediation model and Markov random field based microbial network and differential network analysis. The new methods can be applied to both 16S rRNA and shotgun metagenomic sequencing data and will ideally facilitate the identifications of microbial composition, subcomposition and microbial networks underlying various complex human diseases and biological processes. The project will also investigate the robustness, power and efficiencies of these methods and compare them with existing methods. In addition, this project will develop practical and feasible computer programs for the implementation of the proposed methods, and for the evaluation of the performance of these methods through extensive simulations and analysis of various on-going microbiome studies through the PI's collaborations with Penn physicians and biologists. The work proposed here will contribute statistical methodology for modeling metagenomic sequencing data and high dimensional compositional data, theoretical inference methods for the MFR models and offer insights into each of the biological areas represented by the various data sets. All programs developed under this grant and detailed documentation will be made available free-of-charge to interested researchers.

Public Health Relevance

and Relevance to Public Health This project aims to develop powerful statistical and computational methods for analysis of human microbiome data based on next generation sequencing. The novel statistical methods are expected to gain more insights into how microbial composition variations can lead to different phenotypes such as childhood obesity, progression of chronic kidney diseases and responses to treatments of inflammatory bowel disease. The bacterial taxa identified can potentially serve as biomarkers for disease diagnosis and prognosis.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM123056-03
Application #
9744730
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
2017-09-15
Project End
2021-07-31
Budget Start
2019-08-01
Budget End
2020-07-31
Support Year
3
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Friedman, Elliot S; Li, Yun; Shen, Ting-Chin David et al. (2018) FXR-Dependent Modulation of the Human Small Intestinal Microbiome by the Bile Acid Derivative Obeticholic Acid. Gastroenterology 155:1741-1752.e5
Gao, Yuan; Li, Hongzhe (2018) Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples. Nat Methods 15:1041-1044