The human microbiota plays an important role in health and disease, and its therapeutic manipulation is being actively investigated for a wide range of diseases that span every NIH institute. Our microbiota are inherently dynamic, and analyzing these time-dependent properties is key to robustly linking the microbiota to disease, and predicting the effects of therapies targeting the microbiota; indeed, longitudinal microbiome data is being acquired with increasing frequency, and is a major component of many NIH-funded projects. However, there is currently a dearth of computational tools for analyzing microbiome time-series data, which presents several special challenges including high measurement noise, irregular and sparse temporal sampling, and complex dependencies between variables. The objective of this proposal is to introduce new capabilities, improve on, and provide state-of-the-art implementations of tools for analyzing dynamics, or patterns of change in microbiome time-series data. The tools we develop will use Bayesian machine learning methods, which are well-recognized for their strong conceptual and practical advantages, particularly in biomedical domains. Tools will be rigorously tested and validated on synthetic and real human microbiome data, including publicly available datasets and those from collaborators providing 16S rRNA sequencing, metagenomic, and metabolomics data. We propose three specific aims.
For Aim 1, we will develop integrated Bayesian machine learning tools for predicting population dynamics of the microbiome and its responses to perturbations. These tools will include a new model that simultaneously learns groups of microbes with similar interaction structure and predicts their behavior over time, and that incorporates prior phylogenetic information. The model will be further improved by incorporating stochastic microbial dynamics and errors in measurements throughout the model.
For Aim 2, we will develop Bayesian machine learning tools to predict host status from microbiome dynamics. The tools will learn easily interpretable, human-readable rules that predict host status from microbiome time-series data, and will be further extended to handle a variety of longitudinal study designs.
For Aim 3, we will engineer our microbiome dynamics analysis software tools for optimal performance, ease-of- use, maintainability, extensibility, and dissemination to the community. In total, the proposed work will yield a suite of contemporary software tools for analyzing microbiome dynamics, with expected broad use and major impact. The software will allow investigators to answer important scientific and translational questions about the microbiome, including discovering which microbial taxa or their metagenomes are affected over time by perturbations such as changes in diet or invasion by pathogens; predicting the effects of these perturbations over time, including changes in composition or stability of the gut microbiota; and finding temporal signatures in multi-?omic microbiome data that predict disease risk in the human host.
The human microbiota, or collection of micro-organisms living on and within us, plays an important role in health, and when disrupted or abnormal, may contribute to many types of diseases including infections, kidney diseases, bowel diseases, diabetes, heart diseases, arthritis, allergies, brain diseases, and cancer. Sophisticated computer-based tools are needed to make sense of human microbiota data, particularly time- series data, which can yield important insights into how our microbiomes change over time. This work will develop new and improved computer-based tools for analyzing microbiota time-series data, which will be made freely available and will enable scientists to increase our fundamental knowledge about how our microbiota affect us and ultimately to apply this knowledge to prevent and treat human illnesses.