The inference problems associated with high-dimensional genomic data offer fundamental challenges for modern statistics, machine learning, and data-mining research. Methods that have had success in this domain impose constraints on models incorporating notions of simplicity, smoothness, or robustness. The constraints are often formalized either as priors for Bayesian methods or as geometric criteria for machine learning methods. The heart of this proposal is to develop and relate the importance of the geometry underlying the data to probabilistic modeling. The specific research foci of the proposal are: 1) The exploitation of geometric assumptions for problems of model uncertainty and variable selection in high-dimensional models; 2) A Bayesian framework for the use of ancillary or unlabeled data in predictive modeling; 3) Theory, methods and computation for nonparametric Bayesian kernel models; 4) Novel methods for nonlinear dimension reduction for high-dimensional data from regularization and geometric perspectives.

The proposal develops theory, methods and computational tools for statistical modeling motivated by applications in functional genomics. Modern molecular biology has generated data of a rapidly escalating scale and complexity -- high-throughput genomics data, genetic and sequence information, proteomic and metabolomic data, and other forms of more traditional biomedical or clinical information. Modeling this data for predictive phenotypes of prognosis, diagnosis, and pathway deregulation as well as understanding relevant variables and their associations are fundamental challenges for modern statistics, machine learning, and data-mining research. These methodological developments will have impact on several other scientific areas including biology, engineering, environmental and health science, and social sciences.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0732260
Program Officer
Tie Luo
Project Start
Project End
Budget Start
2007-09-01
Budget End
2010-08-31
Support Year
Fiscal Year
2007
Total Cost
$298,374
Indirect Cost
Name
Duke University
Department
Type
DUNS #
City
Durham
State
NC
Country
United States
Zip Code
27705