In the big data era, observations are collected on ever larger numbers of variables and cases. Even after preliminary reductions of the data, subsets of interest may have high dimensionality and appreciable sample size. The interesting structure in such data is often low dimensional, indeed such models occur in many scientific domains from econometrics to genomics and signal processing and well beyond. This project will investigate the estimation and testing of a particular class of low dimensional structures, namely low rank perturbations of scaled identity or diagonal matrices. It will consider high dimensional versions of multivariate statistical methods that have found wide use for traditional data: principal components, multiple response regression, canonical correlations etc., as well as newer applications such as matrix denoising.

The project will study the proportional limit setting in which the number of variables and the sample size are of the same order of magnitude. It will explore the phase transition phenomenon for a wide class of multivariate methods, using in part the systematic framework developed by A. T. James. Contiguity properties below the phase transition will be investigated as will Gaussian behavior above the critical point. A separate low noise approximation will be used to derive long sought power approximations for largest root tests. In estimation, the project will study optimal shrinkage procedures for the empirical eigenvalues that correspond to the low rank structure, making explicit how the results depend strongly on the particular loss function chosen. Both scalar non-linearities as well as thresholding techniques will be considered. The project will build upon preliminary work for covariance estimation and matrix denoising, and also develop results for other multivariate settings such as low rank factor models, canonical correlations and discriminant analysis.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1407813
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2014-07-15
Budget End
2020-06-30
Support Year
Fiscal Year
2014
Total Cost
$626,835
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305