Rapid and accurate detection of biothreat is important not only for containing its potential damages, but also for determining potential medical remedies. Extensive researches show that certain genes in infected cells have different mRNA expression levels for different pathogens. Thus, an accurate identification of the genes that react to pathogens and an accurate quantification of their expression variations are key steps in early biothreat detections. The emerging RNA-Seq technologies provide tens of millions of short sequence reads of the expressed genes, which, after mapping to the genome, can be converted to accurately represent gene expression levels. However, the conversion from sequence reads to gene expression levels is still problematic. In this project, The investigator and her colleagues will tackle this problem by modeling RNA-Seq data through a broad class of flexible nonlinear models, called sufficient dimension reduction (SDR) models; propose novel variable selection methods for SDR models; and develop theoretical underpinning of the effectiveness of the proposed methods. As a consequence, this effort will result in a powerful software suite for estimating gene expression levels from RNA-seq data and identifying marker genes reacting to specific pathogens in a unified framework.

This project not only addresses some emerging issues in biothreat detections using high-throughput sequencing technologies, but also results in novel statistical methods and theory broadly applicable to general statistical learning and prediction problems. More specifically, the proposed methods (i) produce innovative new methodologies for analyzing ultra-high dimensional data, (ii) inspire new lines of quantitative investigations in genomics, and (iii) offer a unique educational experience for both undergraduate and graduate students to participate in cutting-edge statistical and interdisciplinary research.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1120368
Program Officer
Leland Jameson
Project Start
Project End
Budget Start
2011-08-15
Budget End
2015-07-31
Support Year
Fiscal Year
2011
Total Cost
$309,495
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138