High-dimensional data is prevalent in areas such as medicine, finance, environmental studies, imaging, networking, and the Internet. Making sense of this massive amount of data holds the key to critical scientific questions such as discovering biomarkers related to disease and evaluating the effects of climate change. The proposed research will use statistical learning techniques to develop novel multivariate approaches incorporating sparsity (variable selection), smoothness, and accounting for known structure in high-dimensional data. Extending multivariate methods for structured data improves signal recovery and feature selection. These techniques will be useful in spatio-temporal and image data, for example, where strong correlations from known structure obscure dimension reduction. Much attention has been given to multivariate methods for matrix data. The proposed research will extend many modern multivariate techniques such as sparse principal components analysis to the multi-dimensional or tensor framework. Finally, this proposal seeks to generalize much of the existing literature on regularized multivariate methods such as principal components analysis, canonical correlations analysis and linear discriminant analysis by illustrating how to encourage many types of regularization. This proposal also seeks to develop algorithmic and computational frameworks that will allow researchers to apply these methods to modern massive data sets.

As multivariate analysis techniques enjoy nearly universal application, the theory and methods developed in this proposal will have wide ranging significance in many applied fields. The methods developed are in part motivated by and will have immediate impact in neuroimaging studies, cancer genomics, and metabolomics studies. Other areas where this methodology will prove beneficial include climate studies, remote sensing, networking, engineering, finance, and imaging. Results of this research will be disseminated through the release of open-source, publicly available software and will be incorporated in course material of an advanced graduate course on statistical learning.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1209017
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2012-06-01
Budget End
2015-05-31
Support Year
Fiscal Year
2012
Total Cost
$120,000
Indirect Cost
Name
Rice University
Department
Type
DUNS #
City
Houston
State
TX
Country
United States
Zip Code
77005