A fundamental research issue for large-scale and high-dimensional inference procedures is how can we incorporate the structural information from the data to enhance large-scale computing, machine learning and statistical inference. Learning and exploiting such structure is crucial towards better analysis of complex datasets. This proposal aims to incorporate important structures of the data and association among feature variables and the response variable into devising efficient experimental approaches and algorithms for large scale problems and understanding theoretical properties of the procedures. Project 1 develops a feature screening approach in the high dimension, low sample size paradigm which takes into account the correlation structure among the features. The PI proposes a framework of inference for selecting feature variables relevant to the response variable. In the context of large-scale simultaneous inference, the hypotheses are often accompanied with certain structural prior information. Project 2 proposes a new multiple testing procedure, which maintains control of the false discovery rate while incorporating the prior information. Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence among individual tests is an important but challenging problem in statistics. Project 3 proposes a multiple testing procedure which allows general dependence structures and heterogeneous dependence parameters.

This proposal aims to tackle some challenging research problems, arising from frontiers of biological, medical and scientific research, with a common theme of exploiting structural information in high-dimensional data. New tools for stochastic modeling, computational algorithms, parameter learning, and statistical inference applied to large-scale and high-dimensional data, for example, brain fMRI imaging data and datasets from genome-wide association studies on breast cancer, will be developed. Dissemination of these developments will enhance new knowledge discoveries, and strengthen interdisciplinary collaborations. The research will also be integrated with educational practice through multi-disciplinary courses on the contemporary state-of-the-art data mining and machine learning, and benefit the training and learning of undergraduate, graduate students and underrepresented minorities.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1308872
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2013-08-01
Budget End
2016-07-31
Support Year
Fiscal Year
2013
Total Cost
$130,001
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715