The human microbiota is thought to have profound influence on human health. The goal of the Human Microbiome Project (HMP) is to expand our understanding in human microbiome by generating reference microbiome genomes, identifying """"""""core"""""""" genomes, studying their variation related to human health, and developing new technologies and informatics tools. Huge amounts of sequences in HMP have been generated utilizing metagenomics and next-generation sequencing technologies. It is becoming very challenging for existing resources and methods to manage and analyze the HMP data. The challenges are not only imposed by the huge volume but also by the great diversity and complexity of sequence data. To address these challenges, we propose several new computational methods to rapidly and effectively analyze very large HMP datasets. (1) Consensus-based meta-assembler and pre-assembly processing. It is to significantly improve the assembly of metagenomic sequences. Instead of developing another assembly program, we will build a meta-assembler on top of available assemblers. We will also develop a pre-assembly protocol to filter and handle extra redundant and problematic sequences. (2) Fast fragment recruitment and large-scale clustering. We plan to develop a fast program to align raw metagenomic reads to reference or homolog genomes. It is to fill the gaps between very fast but very stringent mapping programs (e.g. Bowtie), very slow but very sensitive aligning programs (e.g. BLAST), and fast but less sensitive ones (e.g. BLAT). We also plan to enable our clustering program CD-HIT to handle really large next-generation sequences. (3) Dedicated utilities for annotation and comparison of metagenomes. In recent year, we developed a HMM-based method for identification of rRNAs from raw reads, a fast method to identify artificial 454 duplicates, an automated workflow for metagenome annotation, a rapid and reliable reciprocal sequence comparing protocol, and a statistical method to compare many metagenomes with a unique visualization interface. We plan to improve these metagenomics- specific tools to achieve much better speed, performance and capability. The methods will be available as open source software, as web servers or both. We have obtained very promising preliminary results. The proposed tools will effectively help researchers in HMP data analysis. Other HMP related informatics tools in gene prediction, binning and assembly will greatly benefit from our proposed works.

Public Health Relevance

The large amount of sequence data from the Human Microbiome Project (HMP) creates great challenges in data analysis. This proposal aims at addressing these challenges by developing novel and effective computational methods in metagenome assembly, annotation and comparison. The proposed methods will help researchers in preliminary data analysis, annotation, clinical sample comparison, novel gene discovery and other analysis in a very rapid way.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-N (50))
Program Officer
Proctor, Lita
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Diego
Anatomy/Cell Biology
Schools of Medicine
La Jolla
United States
Zip Code
Zhu, Zhengwei; Niu, Beifang; Chen, Jing et al. (2013) MGAviewer: a desktop visualization tool for analysis of metagenomics alignment data. Bioinformatics 29:122-3
Shin, Joo Heon; Li, Robert W; Gao, Yuan et al. (2013) Butyrate Induced IGF2 Activation Correlated with Distinct Chromatin Signatures Due to Histone Modification. Gene Regul Syst Bio 7:57-70
Wu, Sitao; Li, Robert W; Li, Weizhong et al. (2012) Worm burden-dependent disruption of the porcine colon microbiota by Trichuris suis infection. PLoS One 7:e35470
Baldwin 6th, Ransom L; Wu, Sitao; Li, Weizhong et al. (2012) Quantification of Transcriptome Responses of the Rumen Epithelium to Butyrate Infusion using RNA-seq Technology. Gene Regul Syst Bio 6:67-80
Fu, Limin; Niu, Beifang; Zhu, Zhengwei et al. (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150-2
Li, Robert W; Wu, Sitao; Baldwin 6th, Ransom L et al. (2012) Perturbation dynamics of the rumen microbiota in response to exogenous butyrate. PLoS One 7:e29392
Li, Weizhong; Fu, Limin; Niu, Beifang et al. (2012) Ultrafast clustering algorithms for metagenomic sequence analysis. Brief Bioinform 13:656-68
Wu, Sitao; Li, Congjun; Huang, Wen et al. (2012) Alternative splicing regulated by butyrate in bovine epithelial cells. PLoS One 7:e39182
Wu, Sitao; Li, Robert W; Li, Weizhong et al. (2012) Transcriptome characterization by RNA-seq unravels the mechanisms of butyrate-induced epigenomic regulation in bovine cells. PLoS One 7:e36940
Li, Robert W; Wu, Sitao; Li, Weizhong et al. (2012) Alterations in the porcine colon microbiota induced by the gastrointestinal nematode Trichuris suis. Infect Immun 80:2150-7

Showing the most recent 10 out of 12 publications