The ability to analyze complex samples containing collections of microbes will provide a significant increase in the ability to detect and assess biological threats. The rapid growth of sequencing technologies provides the opportunity to obtain DNA sequences in sufficient amounts to allow the analysis of complex samples. What is needed now are computational analysis tools which will allow the exploration and classification of sequences from microbiomes. Current computational metagenomic approaches are incapable of handling the desired analysis. Genome assembly programs yield fragmented metagenome assemblies and comparative taxonomic assignment methods generate phylogenetic results biased towards the known minority of microbial world. The work conducted in this project remedies these shortcomings by approaching the taxonomic and assembly goals in a joint fashion and by providing a set of algorithms that are explicitly developed to function in the complex context of microbiomes. These algorithms are useful for exploring both environmental and human microbiomes. Compositional sampling of microbial communities that populate human GI tract, urogenital system, and the nasal and oral cavities has revealed their diversity and promise as sensitive indicators of disease and response to therapeutic interventions. Similarly, compositional sampling of microbiomes in environmental samples can provide warnings of short term and long term biological threats. The algorithms developed under this project can be used with the second generation and prospective third generation sequencing technologies to rapidly and robustly answer questions about what is contained in a microbiome and perform de novo metagenome assembly.

A major threat to human health is the possibility of the introduction of harmful microorganisms, either deliberately or accidentally, into our environment and food supply. One of the difficulties with detecting this threat in an efficient manner is that our environment is already rich in microorganisms, that are harmless or actively beneficial, and about whom we know very little. We do not know how many different species of bacteria live with us and what these bacteria look like. Before we can figure out whether there is an unknown organism in our environment we need to be able to figure out which bacteria inhabit our bodies and our environments. Until recently this has not been possible to do because, with technologies that were available, in order to identify a bacteria we needed to be able to culture it. This meant growing the bacteria in isolation in the laboratory. There is a very small fraction of bacteria that can be cultured. Most bacteria live in dynamic environments that are difficult to replicate in the laboratory and they live with other bacteria in microbiomes. Recently, however, techniques have been developed which allow us to sequence pieces of DNA from collections of microbes. In this project we take the short snippets of DNA generated by current technology and develop algorithms to separate the contributions from the different bacteria that make up particular colonies. We then put together these snippets into longer contiguous segments which permit a classification of these bacteria in terms of their similarity to known organisms. The techniques we are developing will allow us to create a more complete picture of our environment which will then give us the ability to determine when an unknown organism is introduced into our environment. A more complete picture of the organisms that live within us, such as the bacteria that live in our digestive system, will also help us to develop procedures to diagnose diseases based on the composition of bacterial colonies and indicate approaches for the development of therapies.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1043089
Program Officer
Leland Jameson
Project Start
Project End
Budget Start
2011-09-15
Budget End
2014-08-31
Support Year
Fiscal Year
2010
Total Cost
$246,367
Indirect Cost
Name
University of Nebraska-Lincoln
Department
Type
DUNS #
City
Lincoln
State
NE
Country
United States
Zip Code
68503