While metagenomics can reveal genetic composition (and therefore the genetic potential) of microbial communities, other meta-omic (e.g., metatranscriptomic and metaproteomic) techniques can provide additional insights on functional characteristics of the communities, such as gene activities and their regulation mechanisms. Analyzing these functional microbiome data sets raises new computational challenges. In this application, we propose novel approaches to metatranscriptomic and metaproteomic data analyses, using de Bruijn graph representations of metagenome assemblies as the reference, enabling an integrated analysis of meta-omic data sets. The advantages of using de Bruijn graphs include: 1) they provide a compact representation of metagenomes (which are likely to be redundant) and allow direct computation on the graph, 2) they naturally capture genomic variations;and 3) they capture the """"""""ambiguous"""""""" connectivity between contigs/scaffolds, which can be resolved in subsequent steps using additional information, or utilized to achieve better identification and quantification of genes and proteins using metatranscriptomic and metaproteomic data, respectively. We will apply our new tools to analyzing functional human microbiome data sets, including those to be generated from HMP phase II projects.

Public Health Relevance

We propose to develop graph-centric approaches to metatranscriptomic and metaproteomic data analysis. These approaches will be a timely addition to the computational tools that are central to the interpretation and integration of metagenomic and other functional microbiome data, leading to a better understanding of the functionality and dynamics of microbial communities, and of their responses to environmental changes, e.g. health conditions of their human hosts.

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Research Project (R01)
Project #
1R01AI108888-01A1
Application #
8760378
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Yao, Alison Q
Project Start
2014-08-01
Project End
2018-07-31
Budget Start
2014-08-01
Budget End
2015-07-31
Support Year
1
Fiscal Year
2014
Total Cost
$340,446
Indirect Cost
$115,446
Name
Indiana University Bloomington
Department
Type
Other Domestic Higher Education
DUNS #
006046700
City
Bloomington
State
IN
Country
United States
Zip Code
47401
Zhang, Quan; Ye, Yuzhen (2017) Not all predicted CRISPR-Cas systems are equal: isolated cas genes and classes of CRISPR like elements. BMC Bioinformatics 18:92
Han, Wontack; Wang, Mingjie; Ye, Yuzhen (2017) A concurrent subtractive assembly approach for identification of disease associated sub-metagenomes. Res Comput Mol Biol 2017:18-33
Jiao, Dazhi; Han, Wontack; Ye, Yuzhen (2017) Functional association prediction by community profiling. Methods 129:8-17
Ye, Yuzhen; Zhang, Quan (2016) Characterization of CRISPR RNA transcription by exploiting stranded metatranscriptomic data. RNA 22:945-56
Li, Sujun; Tang, Haixu (2016) Computational Methods in Mass Spectrometry-Based Proteomics. Adv Exp Med Biol 939:63-89
Tang, Haixu; Li, Sujun; Ye, Yuzhen (2016) A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics. PLoS Comput Biol 12:e1005224
Ye, Yuzhen; Tang, Haixu (2016) Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis. Bioinformatics 32:1001-8
Wang, Mingjie; Doak, Thomas G; Ye, Yuzhen (2015) Subtractive assembly for comparative metagenomics, and its application to type 2 diabetes metagenomes. Genome Biol 16:243
Bao, Guanhui; Wang, Mingjie; Doak, Thomas G et al. (2015) Strand-specific community RNA-seq reveals prevalent and dynamic antisense transcription in human gut microbiota. Front Microbiol 6:896