The broad, long-term objective of this project concerns the development of novel statistical methods and com- putational tools for statistical and probabilistic modeling of human microbiome and shotgun metagenomic data motivated by important biological questions and experiments. The speci c aim of the current project is to develop new statistical models, novel inference procedures, and fast computational algorithms for the analysis of 16S rRNA and shotgun metagenomic sequencing data in large-scale human microbiome studies. The project focuses on the development of model-based multi-sample approaches for quantifying microbiome compositions and development methods of compositional mediation analysis in order to quantify the e ects of microbiome mediating the e ect of treatment/risk factor on outcomes. In addition, this project will also develop novel methods for statistical inference including large-scale multiple testing procedures on sparse discrete Markov random eld (MRF) models for microbial interaction network construction and for di erential network analysis. These problems are all moti- vated by the PI's close collaborations with Penn investigators on metagenomic studies of Crohn disease, childhood obesity and disease progression among patients with chronic kidney disease (CKD)). The methods hinge on novel integration of biological insights and methods for modeling sparse count data, high dimensional compositional data analysis and network-based analysis, including nuclear-norm penalized maximum likelihood estimation for tax abundance estimation, compositional mediation model and Markov random eld based microbial network and di erential network analysis. The new methods can be applied to both 16S rRNA and shotgun metagenomic se- quencing data and will ideally facilitate the identi cations of microbial composition, subcomposition and microbial networks underlying various complex human diseases and biological processes. The project will also investigate the robustness, power and eciencies of these methods and compare them with existing methods. In addition, this project will develop practical and feasible computer programs for the implementation of the proposed meth- ods, and for the evaluation of the performance of these methods through extensive simulatons and analysis of various on-going microbiome studies through the PI's collaborations with Penn physicians and biologists. The work proposed here will contribute statistical methodology for modeling metagenomic sequencing data and high dimensional compositional data, theoretical inference methods for the MFR models and o er insights into each of the biological areas represented by the various data sets. All programs developed under this grant and detailed documentation will be made available free-of-charge to interested researchers.

Public Health Relevance

and Relevance to Public Health This project aims to develop powerful statistical and computational methods for analysis of human microbiome data based on next generation sequencing. The novel statistical methods are expected to gain more insights into how microbial composition variations can lead to di erent phenotypes such as childhood obesity, progression of chronic kidney diseases and responses to treatments of in ammatory bowel disease. The bacterial taxa identi ed can potentially serve as biomarkers for disease diagnosis and prognosis.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Friedman, Elliot S; Li, Yun; Shen, Ting-Chin David et al. (2018) FXR-Dependent Modulation of the Human Small Intestinal Microbiome by the Bile Acid Derivative Obeticholic Acid. Gastroenterology 155:1741-1752.e5
Gao, Yuan; Li, Hongzhe (2018) Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples. Nat Methods 15:1041-1044