The human microbiome contributes essential and complementary genetic and metabolic components to the host human. Until recently, microbiologists mainly studied individual culturable species of microbes, even though a vast majority (approximately 95%-98%) of microorganisms cannot live in pure culture. Facilitated by the rapid advancement of the DNA sequencing techniques, metagenomics attempts to directly determine the whole collection of genes within an environmental sample. To study the human microbiome at a global level, metagenomics becomes the methodology of choice for the Human Microbiome Project (HMP). We propose to develop computational methods addressing several challenges to the metagenomic analysis in HMP, namely, the assembly of short reads from pyrosequencing, the functional annotation of protein coding genes through database searching, and the characterization of the biodiversity in samples. We start with a novel approach to assembling short reads from metagenomics, called ORFome Assembly, by assembling putative ORFs from homologous proteins in the same family into a protein family graph (an Eulerian path approach). We then propose a network matching approach for the similarity search using the protein family graphs as queries. We anticipate that using protein family graphs will result in database searching with higher sensitivity and specificity than simply using unassembled sequencing reads. Finally, we propose to develop computational tools to simultaneously assess the biodiversity and biological functions in samples, by identifying the most likely set of coherent pathway variants covering the annotated gene functions within the metagenomic data based on the similarity search results. These software tools will enable researchers to efficiently and effectively analyze the data from HMP, which will enhance the understanding of the relationship between the human microbiota (i.e., the microbes living on the surface and inside human body) and human diseases, and hasten the development of better or new therapies.

Public Health Relevance

We propose to develop computational methods addressing several challenges to the metagenomic analysis of human microbiome project (HMP) data. These software tools will enable researchers to efficiently and effectively analyze the data from HMP, which will enhance the understanding of the relationship between the human microbiota and human diseases, and hasten the development of better or new therapies.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG004908-03
Application #
7910733
Study Section
Special Emphasis Panel (ZRG1-BST-F (50))
Program Officer
Bonazzi, Vivien
Project Start
2008-09-26
Project End
2012-07-31
Budget Start
2010-08-01
Budget End
2012-07-31
Support Year
3
Fiscal Year
2010
Total Cost
$252,700
Indirect Cost
Name
Indiana University Bloomington
Department
Type
Other Domestic Higher Education
DUNS #
006046700
City
Bloomington
State
IN
Country
United States
Zip Code
47401
Jiao, Dazhi; Ye, Yuzhen; Tang, Haixu (2013) Probabilistic inference of biochemical reactions in microbial communities from metagenomic sequences. PLoS Comput Biol 9:e1002981
Zhang, Quan; Doak, Thomas G; Ye, Yuzhen (2012) Artificial functional difference between microbial communities caused by length difference of sequencing reads. Pac Symp Biocomput :259-70
Human Microbiome Project Consortium (2012) Structure, function and diversity of the healthy human microbiome. Nature 486:207-14
Wu, Yu-Wei; Rho, Mina; Doak, Thomas G et al. (2012) Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics. Bioinformatics 28:i363-i369
Huse, Susan M; Ye, Yuzhen; Zhou, Yanjiao et al. (2012) A core human microbiome as viewed through 16S rRNA sequence clusters. PLoS One 7:e34242
Zhao, Yongan; Tang, Haixu; Ye, Yuzhen (2012) RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28:125-6
Rho, Mina; Wu, Yu-Wei; Tang, Haixu et al. (2012) Diverse CRISPRs evolving in human microbiomes. PLoS Genet 8:e1002441
Wang, Mingjie; Ye, Yuzhen; Tang, Haixu (2012) A de Bruijn graph approach to the quantification of closely-related genomes in a microbial community. J Comput Biol 19:814-25
Human Microbiome Project Consortium (2012) A framework for human microbiome research. Nature 486:215-21
Wu, Yu-Wei; Ye, Yuzhen (2011) A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J Comput Biol 18:523-34

Showing the most recent 10 out of 17 publications