Phylogenetic Binning of Metagenomic Sequence Data

Allen, Eric

Abstract

Culture-independent metagenomic studies are essential for understanding our relationship with the organisms comprising the human microbiome, defining optimal microbial composition to maintain health, and devising selective treatment strategies to eliminate pathogens without harming beneficial species. To use metagenomic data effectively, raw DNA sequence data (reads) must be processed computationally (assembled) to obtain longer sequences (contigs). Existing software packages for this purpose are quite inefficient when presented with large, taxonomically diverse samples, resulting in considerable wastage of reads that cannot be assembled. Efforts to maximize assembly efficiency by relaxing stringency can lead to inappropriate joining of sequences from unrelated organisms (chimeric artifacts), compromising data accuracy and usefulness. Taxonomic binning of raw reads as a pre-filtering step is expected to improve metagenomic sequence assembly efficiency, reducing statistical noise due to sample complexity and allowing incorporation of raw reads into longer, more informative contigs without incurring chimeric artifacts. Benefits should be especially significant for less abundant species in complex mixtures. We have developed methods to quantify taxonomic binning program performance and assembly improvements in real metagenomic data sets, including reproducible calibration standards, to enable efficient parameter optimization for existing software and provide reliable benchmarks for future software development.
Our specific aims are to 1) develop new computational methods for large-scale taxonomic classification of metagenomic sequence data, applicable to raw reads as well as assembled contigs;2) develop software and protocols to use taxonomic data binning as a pre-treatment to increase efficiency of existing sequence assembly software;3) benchmark performance enhancement for different assembly software programs using quantitative, statistical tests with both artificially created models and real-life metagenomic data sets of varying size and complexity;4) make new computational methods and performance evaluation tools available to the general scientific community.

Public Health Relevance

Culture-independent (metagenomic) studies of DNA isolated from organisms comprising the human microbiome provide a promising new technology to explore the phylogenetic distribution, ecological relationships, and metabolic capabilities of these organisms. This knowledge is essential in defining microbial compositions required for maintaining human health, as well as devising selective treatment strategies to eliminate pathogens without harming beneficial species. This project seeks to provide improved computational tools for analysis and interpretation of metagenomic sequence data obtained from the human microbiome.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Exploratory/Developmental Grants (R21)
Project #: 1R21HG005107-01
Application #: 7708544
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Bonazzi, Vivien

Project Start: 2009-08-24
Project End: 2011-07-31
Budget Start: 2009-08-24
Budget End: 2010-07-31
Support Year: 1
Fiscal Year: 2009
Total Cost: $187,130
Indirect Cost

Institution

Name: University of California San Diego
Department: Zoology
Type: Schools of Earth Sciences/Natur
DUNS #: 804355790

City: La Jolla
State: CA
Country: United States
Zip Code: 92093

Related projects


NIH 2010 R21 HG	Phylogenetic Binning of Metagenomic Sequence Data Allen, Eric Ellsworth / University of California San Diego	$193,125
NIH 2009 R21 HG	Phylogenetic Binning of Metagenomic Sequence Data Allen, Eric Ellsworth / University of California San Diego	$187,130

Publications

Podell, Sheila; Ugalde, Juan A; Narasingarao, Priya et al. (2013) Assembly-driven community genomics of a hypersaline microbial ecosystem. PLoS One 8:e61692

Ugalde, Juan A; Podell, Sheila; Narasingarao, Priya et al. (2011) Xenorhodopsins, an enigmatic new class of microbial rhodopsins horizontally transferred between archaea and bacteria. Biol Direct 6:52

Jones, Adam C; Monroe, Emily A; Podell, Sheila et al. (2011) Genomic insights into the physiology and ecology of the marine filamentous cyanobacterium Lyngbya majuscula. Proc Natl Acad Sci U S A 108:8815-20

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: