High-dimensional data is prevalent in areas such as medicine, finance, environmental studies, imaging, networking, and the Internet. Making sense of this massive amount of data holds the key to critical scientific questions such as discovering biomarkers related to disease and evaluating the effects of climate change. The proposed research will use statistical learning techniques to develop novel multivariate approaches incorporating sparsity (variable selection), smoothness, and accounting for known structure in high-dimensional data. Extending multivariate methods for structured data improves signal recovery and feature selection. These techniques will be useful in spatio-temporal and image data, for example, where strong correlations from known structure obscure dimension reduction. Much attention has been given to multivariate methods for matrix data. The proposed research will extend many modern multivariate techniques such as sparse principal components analysis to the multi-dimensional or tensor framework. Finally, this proposal seeks to generalize much of the existing literature on regularized multivariate methods such as principal components analysis, canonical correlations analysis and linear discriminant analysis by illustrating how to encourage many types of regularization. This proposal also seeks to develop algorithmic and computational frameworks that will allow researchers to apply these methods to modern massive data sets.
As multivariate analysis techniques enjoy nearly universal application, the theory and methods developed in this proposal will have wide ranging significance in many applied fields. The methods developed are in part motivated by and will have immediate impact in neuroimaging studies, cancer genomics, and metabolomics studies. Other areas where this methodology will prove beneficial include climate studies, remote sensing, networking, engineering, finance, and imaging. Results of this research will be disseminated through the release of open-source, publicly available software and will be incorporated in course material of an advanced graduate course on statistical learning.