The genetic diversity of bacteria and archaea (the prokaryotes) is by far the largest among all living organisms. Whether in soils, waters, human guts, or the atmosphere, prokaryotes affect, if not control, all life-sustaining processes on Earth, but how these microbes interact with and change their environment is not fully understood. Current incomplete understanding is, at least in part, due to the fact that the great majority of microorganisms resist cultivation in the laboratory, i.e., they represent the uncultivable majority, and thus, cannot be studied efficiently. In the past few years, there has been an explosion of culture-independent genomic techniques (a.k.a. metagenomics), which allow the analysis of microorganisms and their communities in their natural habitat by sequencing their entire genomes or transcriptomes, bypassing the need for lab cultivation. However, the development of computational tools and algorithms to analyze metagenomic data is lagging behind developments in sequencing technologies. To advance the understanding of the uncultivable majority of microorganisms, and take full advantage of the investment of society in genomic technologies, new quantitative approaches are needed. The goals of this project are: 1) to develop new computational tools that fulfill critical research needs and thus, help scientists understand the composition, functions and values of the microbial communities, and 2) to train faculty from undergraduate colleges, including community colleges, in new metagenomics techniques, which are positioned at the interface of microbiology, genomics, bioinformatics, and computational biology, a pivotal area of contemporary research and education that is inadequately covered in traditional curricula. Therefore, these activities are expected to provide important infrastructure for training the future workforce and to facilitate contemporary research.

The small subunit ribosomal RNA gene (SSU rRNA) has been successfully used to catalogue and study the diversity of microorganisms for the last two decades. This work has been facilitated by the development of dedicated resources (databases and tool repositories) such as the Ribosomal Database Project (RDP; http://rdp.cme.msu.edu). However, rRNA gene-based studies have important limitations that techniques based on genome sequences do not. For instance, the genomic techniques can better resolve microbial communities at the levels where the SSU rRNA gene provides inadequate resolution, namely the species and finer levels, and catalogue whole-genome diversity and fluidity, which are relevant for nutrient cycling, bioremediation efforts, and emergence of microbial antibiotic resistance. This project seeks to develop tools that overcome several of the limitations of the rRNA gene-based approaches and allow the efficient analysis of microbiomes. Robust implementations of both well-accepted existing methods, such as genome-aggregate average nucleotide identity (gANI) for delineating closely-related species and strains, along with newer methods, including the recently developed Nonpareil method for estimating the coverage of a microbial community obtained by a metagenomic dataset, and MyTaxa method for examining horizontal gene transfer events between microbial lineages will be provided. The overarching objective is to develop the genome equivalent of the RDP that will enable the scientific community to perform classification and diversity studies at the genome level.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
1356380
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2014-07-01
Budget End
2019-06-30
Support Year
Fiscal Year
2013
Total Cost
$552,577
Indirect Cost
Name
Michigan State University
Department
Type
DUNS #
City
East Lansing
State
MI
Country
United States
Zip Code
48824