Revolutionary new technologies are producing high-throughput biological data at a resolution that was unthinkable only a decade ago. These new forms of data pose enormous challenges and opportunities for statisticians and computer scientists. This project develops new sophisticated statistical methods and computational algorithms for analyzing and integrating complex high-dimensional data. The work is motivated by collaborations with leading biological scientists at Cornell-Ithaca and Weill Cornell Medical College working in diverse research areas including plant biology, nutrition, neurology, cancer epigenomics, and veterinary medicine.
The goal of this project is to develop new statistical models and computational algorithms for high-dimensional, low sample size, high-throughput biological data, including new methods for the analysis of microarrays, the identification of quantitative trait loci, association mapping, label-free shotgun proteomics and metabolomics. The proposed methods involve innovative extensions of modern statistical building blocks, including the use of random effects for regularization, shrinkage estimation, Bayesian statistics, and mixtures for posterior classification and prediction. Novel modifications of the expectation-maximization algorithm are proposed for scalable and efficient model fitting and inference.