Assembly and analysis software for exploring the human microbiome

Pop, Mihai

Abstract

Bacteria are the most abundant organisms on Earth, yet little is known about most members of this domain of life. Only about 1% of bacterial species can be easily grown in culture, and considerably fewer have been sequenced. Advances in sequencing technologies have made it possible to sequence bacteria directly from the environment, providing a dramatic new outlook on the diversity of bacteria populating our world. Initial studies have explored the bacteria present in mines, ocean water, and soil, as well as communities of commensal microbes that inhabit the human body. The latter have provided a glimpse at the complex symbiotic relationships between bacteria and their human hosts. Despite an increased interest in environmental sequencing (metagenomics), few specialized computational algorithms exist for the analysis of such data. For example, the assembly of environmental data is being performed with software originally intended for homogeneous DNA sources, such as clonal bacterial populations or inbred eukaryotes. These programs are ill-suited to the assembly of heterogeneous microbial communities and numerous """"""""hacks"""""""" have been necessary to produce the assemblies published to date. This proposal aims to fill the need for specialized software for assembling and finding genes in metagenomic datasets. A particular focus will be on developing tools for uncovering genomic variation within the assemblies of microbial communities. The proposed software will specifically address issues arising from the use of new sequencing technologies in metagenomic projects. The low cost and high throughput of these technologies will allow a far deeper exploration of the microbial biosphere than was previously possible. Their broad application, however, depends on the availability of software systems adapted to their specific characteristics. In addition, new algorithms will be developed to allow the individual components of a metagenomic analysis pipeline to be tightly integrated, with the goal of improving the overall quality of both assembly and annotation, and to facilitate the extraction of other types of information from large sets of metagenomic data. The proposal further aims to investigate the impact of experimental design and choice of sequencing technology on the ability to assemble and analyze metagenomic data, through the development of software for simulating bacterial populations and emulating a variety sequencing strategies. Better experimental design can reduce the high costs currently associated with environmental sequencing and enhance subsequent analyses. All software developed as part of this proposal, as well as any simulated data and results of reanalyzing public datasets will be released freely through public databases and open-source software repositories.

Public Health Relevance

Project Narrative Initial explorations of the communities of bacteria that inhabit our bodies have already provided insights into the complex relationships between microbes and the human host, as well as the contribution of bacteria to diseases such as obesity, and inflammatory bowel disease. Many more studies will be needed to help us fully understand the complex human-microbe interactions and to translate these discoveries into new therapies. The current proposal provides scientists with components of the software infrastructure that will be essential for genomic studies of the human microbiome.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG004885-03
Application #: 7897887
Study Section: Special Emphasis Panel (ZRG1-BST-F (50))
Program Officer: Bonazzi, Vivien

Project Start: 2008-09-24
Project End: 2012-07-31
Budget Start: 2010-08-01
Budget End: 2012-07-31
Support Year: 3
Fiscal Year: 2010
Total Cost: $257,400
Indirect Cost

Institution

Name: University of Maryland College Park
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 790934285

City: College Park
State: MD
Country: United States
Zip Code: 20742

Related projects


NIH 2010 R01 HG	Assembly and analysis software for exploring the human microbiome Pop, Mihai / University of Maryland College Park	$257,400
NIH 2009 R01 HG	Assembly and analysis software for exploring the human microbiome Pop, Mihai / University of Maryland College Park	$260,000
NIH 2009 R01 HG	Assembly and analysis software for exploring the human microbiome Pop, Mihai / University of Maryland College Park	$111,000
NIH 2008 R01 HG	Assembly and analysis software for exploring the human microbiome Pop, Mihai / University of Maryland College Park	$260,000

Publications

Davison, Michelle; Treangen, Todd J; Koren, Sergey et al. (2016) Diversity in a Polymicrobial Community Revealed by Analysis of Viromes, Endolysins and CRISPR Spacers. PLoS One 11:e0160574

Pop, Mihai; Walker, Alan W; Paulson, Joseph et al. (2014) Diarrhea in young children from low-income countries leads to large-scale alterations in intestinal microbiota composition. Genome Biol 15:R76

Treangen, Todd J; Koren, Sergey; Sommer, Daniel D et al. (2013) MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol 14:R2

Paulson, Joseph N; Stine, O Colin; Bravo, Héctor Corrada et al. (2013) Differential abundance analysis for microbial marker-gene surveys. Nat Methods 10:1200-2

Ghodsi, Mohammadreza; Hill, Christopher M; Astrovskaya, Irina et al. (2013) De novo likelihood-based measures for comparing genome assemblies. BMC Res Notes 6:334

Schatz, Michael C; Phillippy, Adam M; Sommer, Daniel D et al. (2013) Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies. Brief Bioinform 14:213-24

Gevers, Dirk; Pop, Mihai; Schloss, Patrick D et al. (2012) Bioinformatics for the Human Microbiome Project. PLoS Comput Biol 8:e1002779

Human Microbiome Project Consortium (2012) Structure, function and diversity of the healthy human microbiome. Nature 486:207-14

Salzberg, Steven L; Phillippy, Adam M; Zimin, Aleksey et al. (2012) GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res 22:557-67

Liu, Bo; Faller, Lina L; Klitgord, Niels et al. (2012) Deep sequencing of the oral microbiome reveals signatures of periodontal disease. PLoS One 7:e37919

Showing the most recent 10 out of 23 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: