The human microbiome is the vast collection of microorganisms living in and on our bodies. While researchers are only just beginning to understand the complex roles that these microbes play in human biology, it is clear that specific changes in microbial flora are associated with and sometimes cause or cure disease in the host. Most microbiome research to date has focused on describing the taxonomic composition of communities in different body sites (e.g., gut, mouth, elbow skin) or in a single site across disease groups. The goal of this project is to move microbiome research from descriptions of "who is out there?" towards characterizations of "what they are doing?". To do so the investigators will develop new methodology for analyzing shotgun metagenomic data, which is a pooled sample of DNA extracted and sequenced from the various microbes in a community. Because the sequences represent short segments of the genomes of many organisms, metagenomics provides a snapshot of the protein repertoire of a microbiome community. This rich data holds great promise and also presents many challenges for data analysis. To meet these challenges, the investigators will first design and validate a bioinformatics pipeline to classify metagenomic sequences into protein families. The key component of this tool will be hidden Markov models describing the evolutionary profile of every known microbial protein, enabling accurate characterization of the protein functions present in a sample. Second, they will derive novel stochastic models to predict the occurrence of protein families in a metagenomic sample given data about the demographic and clinical characteristics of a patient. These models, based on concepts from ecology and convex geometry, will allow the investigators to estimate, draw, and statistically compare the shapes of protein niches in high-dimensional phenotype space. Finally, they will produce a medical Niche Atlas that will link protein distributions to disease states via a publicly accessible, user-friendly visualization tool and database. This project will produce new mathematical theory and novel computational tools for microbiome research, drug development, and bioprospecting. In addition to immediate impacts on microbiome research, our mathematical results will be useful for spatial modeling in other fields, such as ecology, sociology, and epidemiology.

The aim of this project is to develop computational resources and stochastic models that will shed light on the complex relationship between the health of an individual and the activities of the microbes living in and on his/her body. This research will generate a medical Niche Atlas that will map the distributions of microbial protein functions across individuals with different clinical characteristics (e.g., diseases, diets, or treatments). Just as a geographic atlas makes cartographic expertise accessible to a layperson, our Niche Atlas will allow researchers, students, clinicians, and any curious person to easily explore the functional capabilities of the human microbiome from a computer terminal. Its protein niche maps will visually display the range of patient characteristics at which each microbial protein is likely to occur and how these ranges differ across disease states. We will leverage this capability as a graduate teaching tool, for outreach and communication through public media, and to establish experimental collaborations to test the hypotheses we generate about the roles of microbiome proteins in human diseases, such as inflammatory bowel disease. These investigations will aim to identify new disease biomarkers, including microbiome proteins that can be used to diagnose onset of disease in patient subpopulations where early diagnosis is currently difficult. The Niche Atlas will also enable development of personalized treatments and preventions based on knowledge of an individual?s microbiome. Together, these freely accessible resources will significantly broaden access to cutting-edge medicine.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1069303
Program Officer
Mary Ann Horn
Project Start
Project End
Budget Start
2011-08-15
Budget End
2016-07-31
Support Year
Fiscal Year
2010
Total Cost
$1,508,535
Indirect Cost
Name
The J. David Gladstone Institutes
Department
Type
DUNS #
City
San Francisco
State
CA
Country
United States
Zip Code
94158