Microbial communities, or microbiomes, are an essential part of life on Earth. Microbiomes in the natural environment, including those associated with animals and plants, have thousands of interacting microbial species. Microbial communities influence key aspects of host health and behavior. They drive basic biochemical processes in their hosts, such as nutrient processing in the guts and sequestration of carbon in the Earth's oceans. The study of microbiomes has been recently revolutionized by the use of advanced sequencing technologies. However, large-scale sequencing initiatives, such as the Human Microbiome Project and the Earth Microbiome Project, are generating Petabytes (10 to the power of 15 bytes) of data, more than existing analysis tools can handle. The goal of this project is to develop transformative computational methods and implement software tools that enable the analysis of these very large datasets. Specifically, these tools will provide improved methods to organize community gene expression data (metatranscriptomes) into metabolic pathways, which informs predictions of how biochemical processes transform matter and energy. To maximize its impact, the developed software tools will be made available to the research community as stand-alone open source packages and deployed on common cloud computing environments. The project will provide opportunities for mentoring undergraduate and graduate students at Georgia State University, University of Connecticut, and Georgia Tech and promote participation of women and underrepresented groups in bioinformatics research and empirical analysis of community-level sequence (DNA/RNA) datasets. Selected aspects of the proposed research will be incorporated in courses at the three universities, and form the basis of innovative curriculum and educational materials, including the creation of mobile applications.
This project brings together an interdisciplinary team of computer scientists and environmental microbiologists to develop and implement computational tools that enable de novo analysis of large multi-sample microbiome sequencing datasets, addressing current challenges in metatranscriptome assembly and inference of metabolic pathway activity. Specific aims of the project include: (i) developing highly scalable algorithms for de novo assembly and quantification from multiple metatranscriptomic samples, (ii) developing highly accurate algorithms for estimation of metabolic pathway activity level and differential activity testing, (iii) developing and validating prototype implementations of developed methods. A distinguishing feature of the developed methods will be their ability to jointly analyze multiple related metatranscriptomic samples. This joint assembly and quantification paradigm is likely to find applications beyond microbiome research, e.g., in the emerging area of single cell genomics. The results of the project, including software packages, research publications, and educational materials, will be made available at http://alan.cs.gsu.edu/NGS/?q=software and http://dna.engr.uconn.edu/?page_id=719