CRII: SCH: Accelerating Human Microbiome Analysis using Lightning-Fast Cloud Computing

Ahn, Tae Hyuk

Abstract

Humans carry ten times more bacterial cells than human cells, and a hundred times more bacterial genes than the inherited human genome. Human microbes also hold secrets for maintaining health and preventing disease. For the last decade, a cultivation-independent metagenomics approach, in which all microorganisms in a sample are directly sequenced together, has been intensely applied to understand microbes' impact on human health. A new generation of sequencing technologies accelerated research, but left a vast amount of metagenomic sequencing data to be analyzed. Software and high-performance computing systems that could speed analysis are still lacking. The PI proposes to develop novel computational algorithms and cloud computing software to decipher terabytes of metagenomic sequencing data for studying the human microbiome. Experience from these pursuits will accelerate development of the proposed tools for better understanding the ecosystem in our bodies. Ultimately, this may contribute to better diagnosis, prevention, and treatment of disease. Furthermore, the proposed cloud computing algorithms and techniques could be adapted to many other applications demanding high computation complexity. A key proposal ingredient is offering graduate and undergraduate computer science students a unique opportunity for interdisciplinary research designing algorithms and software to solve biological problems.

Novel computational algorithms and a cloud computing software tool are proposed, to analyze large-scale metagenomic sequencing data to study the human microbiome. The project would feature Apache Spark, a cutting-edge, open-source cluster computing framework for large-scale data processing. It supports a rich set of high-level tools including scalable machine learning and graph processing libraries. The primary novelty is a cloud scalable de novo assembler, and the ability to compare assembled sequences to existing reference genomes using Spark libraries. This new approach will speed identification of novel genomes and composition of microbes from large metagenomic data. Most existing metagenomic analysis methods separately execute de novo sequence assembly and taxonomy classification with many existing reference genomes. Key technical innovations of the proposed work are (i) cloud computing algorithms enabling a fast and scalable metagenome assembler, (ii) taking assembled sequences directly for taxonomy to dramatically reduce computation time, and (iii) a cloud container package allowing researchers to analyze metagenomic data easily and cheaply. Providing a cloud container package with a simple Web interface will enable researchers to analyze their large-scale metagenomic sequence data readily and quickly for human health, biosurveillance, and pan-genomic analysis of microbiota.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1566292
Program Officer: Wendy Nilsen

Project Start
Project End
Budget Start: 2016-04-01
Budget End: 2019-03-31
Support Year
Fiscal Year: 2015
Total Cost: $174,174
Indirect Cost

CRII: SCH: Accelerating Human Microbiome Analysis using Lightning-Fast Cloud Computing
Ahn, Tae Hyuk
Saint Louis University, St Louis, MO, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments