Novel Computational Tools for Studying the Human Microbiome

Fredricks, David; Matsen, Frederick

Abstract

The Human Microbiome Project will generate billions of high throughput sequence reads from rRNA gene PCR products and metagenomic DNA;these data have the potential to revolutionize our understanding of the microbial inhabitants of humans, the putative functions of these microbes, and their associations with health and disease. However, limitations in our ability to process this flood of data hinder our ability to make inferences or draw conclusions. Specifically, commonly available methods for identifying microbes from DNA or RNA sequences do not identify organisms to the species level, and may fail to perform confident assignment to the genus level or higher despite sufficient phylogenetic information to do so. As a result, many publicly available classification tools lump sequences representing distinct species into less specific taxonomic categories, as we have found when applying these tools to several novel bacteria linked with vaginal disease. This proposal is significant because it offers solutions to these fundamental problems by developing and refining novel computational tools;prototypes of these tools have already demonstrated significantly improved results. Our freely available software will help catalyze research on the human microbiome by increasing the speed, accuracy, and specificity of microbial identification, as well as offering methods for between-sample comparison. There are several innovative features of this proposal. First, computationally efficient maximum-likelihood phylogenetic placement of sequences on trees will provide a robust method for identifying microbes and distinguishing between novelty and uncertainty. Second, this proposal will provide accurately annotated collections of reference sequences that can facilitate classification of organisms present in major human body sites. More importantly, this proposal will develop software tools that will enable individual researchers to assemble sets of reference sequences using an approach that maximizes sequence diversity within each represented taxon while excluding poor quality and mislabeled sequences. Third, this proposal will develop new analysis and visualization tools to aid statistical comparison of microbial communities across space and time, and help capture these complex changes in intuitive visualizations.
Aim 1 : Develop and optimize phylogenetic placement software for the analysis of 16S rRNA and other phylogenetically informative loci to better describe bacterial diversity and community composition.
This aim will advance the development of our phylogenetic placement software pplacer, including the addition of algorithms for taxonomic annotation and species delineation, implementation of improved measures of uncertainty, and low-level code optimization.
Aim 2 : Develop computational tools to curate project-specific sets of reference sequences from public repositories and local sources.
This aim i s motivated by our observation that appropriately selected reference sequences and accurate phylogenies are a critical and limiting component of the classification process.
Aim 3 : Develop a software pipeline to integrate high throughput sequencing data analysis, including preprocessing, phylogenetic placement, statistical comparison, and phylogenetic visualization.
This aim will result in two deliverables extending the capabilities of a broad spectrum of researchers: a web service for users who value simplicity, as well as R / Bioconductor software packages for users who value modularity, reproducibility, and extensibility.

Public Health Relevance

Human-associated microbes can have a major impact on human health, either by promoting beneficial interactions (such as facilitating nutrient absorption) or by damaging host tissues thereby producing disease. New sequencing technologies provide an unprecedented opportunity to explore the relationships between microbes and humans, but our computational tools have not kept pace with the technology for characterizing microbial populations. This project seeks to close this gap by developing computational tools for analyzing high throughput sequence data so that the full power of sequencing technologies can be used to accurately identify microbes and assess their relationships with human health.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG005966-03
Application #: 8307930
Study Section: Special Emphasis Panel (ZRG1-GGG-N (50))
Program Officer: Proctor, Lita

Project Start: 2010-09-27
Project End: 2014-06-30
Budget Start: 2012-07-01
Budget End: 2014-06-30
Support Year: 3
Fiscal Year: 2012
Total Cost: $491,177
Indirect Cost: $178,354

Institution

Name: Fred Hutchinson Cancer Research Center
Department
Type
DUNS #: 078200995

City: Seattle
State: WA
Country: United States
Zip Code: 98109

Related projects


NIH 2012 R01 HG	Novel Computational Tools for Studying the Human Microbiome Fredricks, David Neal; Matsen, Frederick Albert / Fred Hutchinson Cancer Research Center	$491,177
NIH 2011 R01 HG	Novel Computational Tools for Studying the Human Microbiome Fredricks, David Neal; Matsen, Frederick Albert / Fred Hutchinson Cancer Research Center	$495,844
NIH 2010 R01 HG	Novel Computational Tools for Studying the Human Microbiome Fredricks, David Neal; Matsen, Frederick Albert / Fred Hutchinson Cancer Research Center	$533,722

Publications

Gall, Alevtina; Fero, Jutta; McCoy, Connor et al. (2015) Bacterial Composition of the Human Upper Gastrointestinal Tract Microbiome Is Dynamic and Associated with Genomic Instability in a Barrett's Esophagus Cohort. PLoS One 10:e0129055

Matsen 4th, Frederick A (2015) Phylogenetics and the human microbiome. Syst Biol 64:e26-41

Srinivasan, Sujatha; Morgan, Martin T; Liu, Congzhou et al. (2013) More than meets the eye: associations of vaginal bacteria with gram stain morphotypes using molecular phylogenetic analysis. PLoS One 8:e78633

McCoy, Connor O; Gallagher, Aaron; Hoffman, Noah G et al. (2013) Nestly--a framework for running software with nested parameter choices and aggregating results. Bioinformatics 29:387-8

Matsen 4th, Frederick A; Evans, Steven N (2013) Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison. PLoS One 8:e56859

Nipperess, David A; Matsen 4th, Frederick A (2013) The mean and variance of phylogenetic diversity under rarefaction. Methods Ecol Evol 4:566-572

Matsen, Frederick A; Gallagher, Aaron (2012) Reconciling taxonomy and phylogenetic inference: formalism and algorithms for describing discord and inferring taxonomic roots. Algorithms Mol Biol 7:8

Evans, Steven N; Matsen, Frederick A (2012) The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples. J R Stat Soc Series B Stat Methodol 74:569-592

Srinivasan, Sujatha; Hoffman, Noah G; Morgan, Martin T et al. (2012) Bacterial communities in women with bacterial vaginosis: high resolution phylogenetic analyses reveal relationships of microbiota to clinical criteria. PLoS One 7:e37818

Matsen, Frederick A; Hoffman, Noah G; Gallagher, Aaron et al. (2012) A format for phylogenetic placements. PLoS One 7:e31009

Comments

Be the first to comment on David Fredricks's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: