Many statistical methods for dimensionality reduction, classification, and prediction, require an estimate of a covariance or precision matrix. In high-dimensional settings, (where the number of variables is larger than the sample size), it is known that classical covariance estimation with the sample covariance performs poorly. This has lead to a wealth of alternative regularized high-dimension covariance estimators, many of which have been proposed in the last decade. These estimators have been analyzed primarily in terms of how they perform when estimating the population covariance or precision matrix directly, rather than how they affect the performance of the statistical methods that require a regularized covariance estimate. A particular class of statistical methods of interest is those that perform sufficient dimension reduction (SDR), a powerful approach to reduce the dimensionality of the predictor in regression problems. Most of the SDR methodology and theory requires the number of variables to be less than the sample size, preventing its application to high-dimensional data. The PI, Co-PI, and their colleagues adapt sufficient dimension reduction methodology to high-dimensional settings via regularized covariance estimation. Specifically, they develop alternative SDR methodology, high-dimensional asymptotic analysis (as both the number of variables and the sample size grow), efficient computational algorithms, and applications to data.

Genetics, spectroscopy, climate studies, and remote sensing are a few examples of the many research fields that produce high-dimensional data; these are data with many more measured characteristics than subjects or cases. Many standard statistical methods for prediction, classification, and data reduction are either inapplicable or perform poorly in this setting. In response, statistical methods to extract a subset of the measured characteristics for use in predictive models have been developed; however, these methods operate under the assumption that a relatively small number of measured characteristics are relevant for prediction. The investigators address this deficiency by developing new methods for the reduction of high-dimensional data for use in predictive modeling, which unlike many existing methods, are able to extract relevant predictive information from all of the measured characteristics. In addition, the investigators develop publicly available computer software to implement these new methods, enabling their application by researchers and practitioners in many fields.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1105650
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2011-07-01
Budget End
2015-06-30
Support Year
Fiscal Year
2011
Total Cost
$195,311
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Type
DUNS #
City
Minneapolis
State
MN
Country
United States
Zip Code
55455