Developing Advanced Algorithms to Address Major Computational Challenges in Current Microbiome Research

Sun, Yijun

Abstract

We propose a three-year interdisciplinary research plan to address two key issues currently facing the metagenomics community. The first issue concerns accurate construction and annotation of OTU tables using of millions of 16S rRNA sequences, which is one of the most important yet most difficult problems inmicrobiome data analysis. Currently, it lacks computational algorithms capable of handling extremely large sequence data and constructing biologically consistent OTU tables. We propose a novel method that performs OTU table construction and annotation simultaneously by utilizing input and reference sequences, reference annotations, and data clustering structure within one analytical framework. Dynamic data-driven cutoffs are derived to identify OTUs that are consistent not only with data clustering structure but also with reference annotations. When successfully implemented, our method will generally address the computational needs of processing hundreds of millions of 16S rRNA reads that are currently being generated by large-scale studies. The second issue concerns developing novel methods to extract pertinent information from massive sequence data, thereby facilitating the field shifting from descriptive research to mechanistic studies. We are particularly interested in microbial community dynamics analysis, which can provide a wealth of insight into disease development unattainable through a static experiment design, and lays a critical foundation for developing probiotic and antibiotic strategies to manipulate microbial communities. Traditionally, system dynamics is approached through time-course studies. However, due to economical and logistical constraints, time-course studies are generally limited by the number of samples examined and the time period followed. With the rapid development of sequencing technology, many thousands of samples are being collected in large-scale studies. This provides us with a unique opportunity to develop a novel analytical strategy to use static data, instead of time-course data, to study microbial community dynamics. To our knowledge, this is the first time that massive static data is used to study dynamic aspects of microbial communities. When successfully implemented, our approach can effectively overcome the sampling limitation of time-course studies, and opens a new avenue of research to study microbial dynamics underlying disease development without performing a resource-intensive time-course study. The proposed pipeline will be intensively tested on a large oral microbiome dataset consisting of ~2,600 subgingival samples (~330M reads). The analysis can significantly advance our understanding of dynamic behaviors of oral microbial communities possibly contributing to the development of periodontal disease. To our knowledge, no prior work has been performed on this scale to study oral microbial community dynamics. We have assembled a multidisciplinary team that covers expertise spanning the areas of machine learning, bioinformatics, and oral microbiology. The expected outcome of this work will be a set of computational tools of high utility for the microbiology community and beyond.

Public Health Relevance

The human microbiome plays essential roles in many important physiological processes. We propose an interdisciplinary research plan to address some major computational challenges in current microbiome research. If successfully implemented, this work could significantly expand the capacity of existing pipelines for large-scale data analysis and scientific discovery, resulting in a significant impact on the field.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Allergy and Infectious Diseases (NIAID)
Type: Research Project (R01)
Project #: 5R01AI125982-02
Application #: 9270498
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brown, Liliana L

Project Start: 2016-05-15
Project End: 2019-04-30
Budget Start: 2017-05-01
Budget End: 2018-04-30
Support Year: 2
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: State University of New York at Buffalo
Department: Microbiology/Immun/Virology
Type: Schools of Medicine
DUNS #: 038633251

City: Amherst
State: NY
Country: United States
Zip Code: 14228

Related projects


NIH 2018 R01 AI	Developing Advanced Algorithms to Address Major Computational Challenges in Current Microbiome Research Sun, Yijun / State University of New York at Buffalo
NIH 2017 R01 AI	Developing Advanced Algorithms to Address Major Computational Challenges in Current Microbiome Research Sun, Yijun / State University of New York at Buffalo
NIH 2016 R01 AI	Developing Advanced Algorithms to Address Major Computational Challenges in Current Microbiome Research Sun, Yijun / State University of New York at Buffalo

Publications

McAdams, Natalie M; Simpson, Rachel M; Chen, Runpu et al. (2018) MRB7260 is essential for productive protein-RNA interactions within the RNA editing substrate binding complex during trypanosome RNA editing. RNA 24:540-556

Banack, Hailey R; Genco, Robert J; LaMonte, Michael J et al. (2018) Cohort profile: the Buffalo OsteoPerio microbiome prospective cohort study. BMJ Open 8:e024263

Tutino, Vincent M; Poppenberg, Kerry E; Jiang, Kaiyu et al. (2018) Circulating neutrophil transcriptome may reveal intracranial aneurysm signature. PLoS One 13:e0191407

Furuya, Hideki; Tamashiro, Paulette M; Shimizu, Yoshiko et al. (2017) Sphingosine Kinase 1 expression in peritoneal macrophages is required for colon carcinogenesis. Carcinogenesis 38:1218-1227

Sun, Yijun; Yao, Jin; Yang, Le et al. (2017) Computational approach for deriving cancer progression roadmaps from static sample data. Nucleic Acids Res 45:e69

Cai, Yunpeng; Zheng, Wei; Yao, Jin et al. (2017) ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time. PLoS Comput Biol 13:e1005518

Qi Mao; Li Wang; Tsang, Ivor W et al. (2017) Principal Graph and Structure Learning Based on Reversed Graph Embedding. IEEE Trans Pattern Anal Mach Intell 39:2227-2241

Scharf, Michael E; Cai, Yunpeng; Sun, Yijun et al. (2017) A meta-analysis testing eusocial co-option theories in termite gut physiology and symbiosis. Commun Integr Biol 10:e1295187

Yacoub, Rabi; Nugent, Melinda; Cai, Weijin et al. (2017) Advanced glycation end products dietary restriction effects on bacterial gut microbiota in peritoneal dialysis patients; a randomized open label controlled trial. PLoS One 12:e0184789

Simpson, Rachel M; Bruno, Andrew E; Chen, Runpu et al. (2017) Trypanosome RNA Editing Mediator Complex proteins have distinct functions in gRNA utilization. Nucleic Acids Res 45:7965-7983

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: