While metagenomics can reveal genetic composition (and therefore the genetic potential) of microbial communities, other meta-omic (e.g., metatranscriptomic and metaproteomic) techniques can provide additional insights on functional characteristics of the communities, such as gene activities and their regulation mechanisms. Analyzing these functional microbiome data sets raises new computational challenges. In this application, we propose novel approaches to metatranscriptomic and metaproteomic data analyses, using de Bruijn graph representations of metagenome assemblies as the reference, enabling an integrated analysis of meta-omic data sets. The advantages of using de Bruijn graphs include: 1) they provide a compact representation of metagenomes (which are likely to be redundant) and allow direct computation on the graph, 2) they naturally capture genomic variations;and 3) they capture the """"""""ambiguous"""""""" connectivity between contigs/scaffolds, which can be resolved in subsequent steps using additional information, or utilized to achieve better identification and quantification of genes and proteins using metatranscriptomic and metaproteomic data, respectively. We will apply our new tools to analyzing functional human microbiome data sets, including those to be generated from HMP phase II projects.

Public Health Relevance

We propose to develop graph-centric approaches to metatranscriptomic and metaproteomic data analysis. These approaches will be a timely addition to the computational tools that are central to the interpretation and integration of metagenomic and other functional microbiome data, leading to a better understanding of the functionality and dynamics of microbial communities, and of their responses to environmental changes, e.g. health conditions of their human hosts.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Yao, Alison Q
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Indiana University Bloomington
Other Domestic Higher Education
United States
Zip Code
Jiao, Dazhi; Han, Wontack; Ye, Yuzhen (2017) Functional association prediction by community profiling. Methods 129:8-17
Zhang, Quan; Ye, Yuzhen (2017) Not all predicted CRISPR-Cas systems are equal: isolated cas genes and classes of CRISPR like elements. BMC Bioinformatics 18:92
Han, Wontack; Wang, Mingjie; Ye, Yuzhen (2017) A concurrent subtractive assembly approach for identification of disease associated sub-metagenomes. Res Comput Mol Biol 2017:18-33
Ye, Yuzhen; Zhang, Quan (2016) Characterization of CRISPR RNA transcription by exploiting stranded metatranscriptomic data. RNA 22:945-56
Li, Sujun; Tang, Haixu (2016) Computational Methods in Mass Spectrometry-Based Proteomics. Adv Exp Med Biol 939:63-89
Tang, Haixu; Li, Sujun; Ye, Yuzhen (2016) A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics. PLoS Comput Biol 12:e1005224
Ye, Yuzhen; Tang, Haixu (2016) Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis. Bioinformatics 32:1001-8
Wang, Mingjie; Doak, Thomas G; Ye, Yuzhen (2015) Subtractive assembly for comparative metagenomics, and its application to type 2 diabetes metagenomes. Genome Biol 16:243
Bao, Guanhui; Wang, Mingjie; Doak, Thomas G et al. (2015) Strand-specific community RNA-seq reveals prevalent and dynamic antisense transcription in human gut microbiota. Front Microbiol 6:896