High-throughput next generation sequencing technologies provide a powerful way to detect biological threats from metagenomic samples taken directly from the environment without prior knowledge of sample composition. In the analysis of metatranscriptomic data sets, researchers can examine and compare the active gene functions and pathways in the environmental or host-associated metagenomic samples with the presence or absence of biological threat agents (organisms or viruses). This is accomplished by identifying which genes are active in a sample and characterizing which functional patterns are associated with the presence of biothreat agents. Moreover, functional analysis of metagenomes can explore how functional diversity of microbial communities correlate with important biological factors of interest including the presence of a particular threat organism and its virulence level. In this research the investigators are to build rigorous statistical models and rapid computational algorithms to define detectable signatures of biological threats based on metatranscriptomic sequencing data. In particular, they study to (1) develop a probabilistic framework for characterizing the gene content in one metatranscriptomic sample, with sequencing errors considered; (2) compare multiple metatranscriptomic samples to detect statistically significant functional patterns that are associated with a biothreat agent; (3)identify "threat marker" based on the functional patterns that are linked to the presence of a biothreat agent and its virulence level where a novel statistical approach for high-dimensional variable selection problem will be proposed; (4) develop an R software package - FunctionMeta - implementing the statistical models and computational algorithms. In addition, standalone software - FunctionSim - will be developed for generating synthesized sequencing data.

Known or newly emerging infectious agents, no matter whether they occurs naturally or dispersed intentionally, are a potential threat we might have to face in our modern and globalized society. In this research the investigators develop novel statistical and computational methodologies for rapid detection of biological threats based on metatranscriptomic sequencing data. Moreover, the algorithms are applicable for other functional metagenomics studies. Both the R software package and the sequence simulator tool will be made publicly available for the research community. Besides training graduate students and postdoc in the cutting-edge statistical and interdisciplinary research, the project will develop an online teaching module (posted as a series of University iTunes videos) for high school students to have an opportunity to learn the new science of metagenomics and its applications in forensics and environmental biology with emphasis of statistics in biological and health science research.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1222592
Program Officer
Leland Jameson
Project Start
Project End
Budget Start
2012-08-15
Budget End
2016-12-31
Support Year
Fiscal Year
2012
Total Cost
$722,978
Indirect Cost
Name
University of Arizona
Department
Type
DUNS #
City
Tucson
State
AZ
Country
United States
Zip Code
85719