Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences

Li, Weizhong

Abstract

The human microbiota is thought to have profound influence on human health. The goal of the Human Microbiome Project (HMP) is to expand our understanding in human microbiome by generating reference microbiome genomes, identifying """"""""core"""""""" genomes, studying their variation related to human health, and developing new technologies and informatics tools. Huge amounts of sequences in HMP have been generated utilizing metagenomics and next-generation sequencing technologies. It is becoming very challenging for existing resources and methods to manage and analyze the HMP data. The challenges are not only imposed by the huge volume but also by the great diversity and complexity of sequence data. To address these challenges, we propose several new computational methods to rapidly and effectively analyze very large HMP datasets. (1) Consensus-based meta-assembler and pre-assembly processing. It is to significantly improve the assembly of metagenomic sequences. Instead of developing another assembly program, we will build a meta-assembler on top of available assemblers. We will also develop a pre-assembly protocol to filter and handle extra redundant and problematic sequences. (2) Fast fragment recruitment and large-scale clustering. We plan to develop a fast program to align raw metagenomic reads to reference or homolog genomes. It is to fill the gaps between very fast but very stringent mapping programs (e.g. Bowtie), very slow but very sensitive aligning programs (e.g. BLAST), and fast but less sensitive ones (e.g. BLAT). We also plan to enable our clustering program CD-HIT to handle really large next-generation sequences. (3) Dedicated utilities for annotation and comparison of metagenomes. In recent year, we developed a HMM-based method for identification of rRNAs from raw reads, a fast method to identify artificial 454 duplicates, an automated workflow for metagenome annotation, a rapid and reliable reciprocal sequence comparing protocol, and a statistical method to compare many metagenomes with a unique visualization interface. We plan to improve these metagenomics- specific tools to achieve much better speed, performance and capability. The methods will be available as open source software, as web servers or both. We have obtained very promising preliminary results. The proposed tools will effectively help researchers in HMP data analysis. Other HMP related informatics tools in gene prediction, binning and assembly will greatly benefit from our proposed works.

Public Health Relevance

The large amount of sequence data from the Human Microbiome Project (HMP) creates great challenges in data analysis. This proposal aims at addressing these challenges by developing novel and effective computational methods in metagenome assembly, annotation and comparison. The proposed methods will help researchers in preliminary data analysis, annotation, clinical sample comparison, novel gene discovery and other analysis in a very rapid way.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG005978-03
Application #: 8294893
Study Section: Special Emphasis Panel (ZRG1-GGG-N (50))
Program Officer: Proctor, Lita

Project Start: 2010-09-27
Project End: 2014-06-30
Budget Start: 2012-07-01
Budget End: 2014-06-30
Support Year: 3
Fiscal Year: 2012
Total Cost: $367,063
Indirect Cost: $130,248

Institution

Name: University of California San Diego
Department: Anatomy/Cell Biology
Type: Schools of Medicine
DUNS #: 804355790

City: La Jolla
State: CA
Country: United States
Zip Code: 92093

Related projects


NIH 2012 R01 HG	Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences Li, Weizhong / University of California San Diego	$367,063
NIH 2011 R01 HG	Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences Li, Weizhong / University of California San Diego	$367,440
NIH 2010 R01 HG	Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences Li, Weizhong / University of California San Diego	$394,784

Publications

Zhu, Zhengwei; Niu, Beifang; Chen, Jing et al. (2013) MGAviewer: a desktop visualization tool for analysis of metagenomics alignment data. Bioinformatics 29:122-3

Shin, Joo Heon; Li, Robert W; Gao, Yuan et al. (2013) Butyrate Induced IGF2 Activation Correlated with Distinct Chromatin Signatures Due to Histone Modification. Gene Regul Syst Bio 7:57-70

Wu, Sitao; Li, Robert W; Li, Weizhong et al. (2012) Worm burden-dependent disruption of the porcine colon microbiota by Trichuris suis infection. PLoS One 7:e35470

Baldwin 6th, Ransom L; Wu, Sitao; Li, Weizhong et al. (2012) Quantification of Transcriptome Responses of the Rumen Epithelium to Butyrate Infusion using RNA-seq Technology. Gene Regul Syst Bio 6:67-80

Fu, Limin; Niu, Beifang; Zhu, Zhengwei et al. (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150-2

Li, Robert W; Wu, Sitao; Baldwin 6th, Ransom L et al. (2012) Perturbation dynamics of the rumen microbiota in response to exogenous butyrate. PLoS One 7:e29392

Li, Weizhong; Fu, Limin; Niu, Beifang et al. (2012) Ultrafast clustering algorithms for metagenomic sequence analysis. Brief Bioinform 13:656-68

Wu, Sitao; Li, Congjun; Huang, Wen et al. (2012) Alternative splicing regulated by butyrate in bovine epithelial cells. PLoS One 7:e39182

Wu, Sitao; Li, Robert W; Li, Weizhong et al. (2012) Transcriptome characterization by RNA-seq unravels the mechanisms of butyrate-induced epigenomic regulation in bovine cells. PLoS One 7:e36940

Li, Robert W; Wu, Sitao; Li, Weizhong et al. (2012) Alterations in the porcine colon microbiota induced by the gastrointestinal nematode Trichuris suis. Infect Immun 80:2150-7

Showing the most recent 10 out of 12 publications

Comments

Be the first to comment on Weizhong Li's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: