Revolutionary new technologies are producing high-throughput biological data at a resolution that was unthinkable only a decade ago. These new forms of data pose enormous challenges and opportunities for statisticians and computer scientists. This project develops new sophisticated statistical methods and computational algorithms for analyzing and integrating complex high-dimensional data. The work is motivated by collaborations with leading biological scientists at Cornell-Ithaca and Weill Cornell Medical College working in diverse research areas including plant biology, nutrition, neurology, cancer epigenomics, and veterinary medicine.

The goal of this project is to develop new statistical models and computational algorithms for high-dimensional, low sample size, high-throughput biological data, including new methods for the analysis of microarrays, the identification of quantitative trait loci, association mapping, label-free shotgun proteomics and metabolomics. The proposed methods involve innovative extensions of modern statistical building blocks, including the use of random effects for regularization, shrinkage estimation, Bayesian statistics, and mixtures for posterior classification and prediction. Novel modifications of the expectation-maximization algorithm are proposed for scalable and efficient model fitting and inference.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1611893
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2016-08-15
Budget End
2020-07-31
Support Year
Fiscal Year
2016
Total Cost
$200,000
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850