Modern scientific data acquisition often creates datasets with many measurements on a large sample size of individuals. The structure of interest in the data may be low dimensional or sparse. Examples arise in scientific domains from econometrics to genomics and image and signal processing and beyond. The project explores approximate methods of statistical inference for such structure in a representative set of contemporary settings: high dimensional estimation, generalized linear mixed models and low rank multivariate models. It aims to develop approximate methods backed by an optimality theory and/or performance guarantees. The work is expected to provide new theoretical insights into current problems in specific application domains such as Magnetic Resonance Imaging and quantitative genetics.

The project will bring ideas from classical decision theory to compressed sensing and robust linear modeling, rigorously solving nonconvex optimization problems and obtaining reconstruction performance rigorously better than traditional convex optimization methods. It will exploit decision theoretic ideas in the proposer's previous work to provide new theoretical insights into a pressing practical problem in compressed sensing -- deriving optimal variable-density sampling schedules applicable to Magnetic Resonance Imaging and NMR spectroscopy. The project will also study theoretically the statistical performance of some methods for deterministic approximate inference popular in machine learning, but for which little attention has been given to properties such as consistency, asymptotic normality and efficiency. The study will begin with concrete examples in the realm of generalized linear mixed models, from a frequentist perspective, and seek to establish first-of-kind results for asymptotic efficiency of Expectation Propagation. Finally, the project will study approximate inference for the eigenstructure of highly multivariate models with low dimensional structure. It will adapt James' classical framework for multivariate analysis to a broad class of multispike models. Through collaboration with quantitative geneticists, it will develop methods for inference for low dimensional structure in high dimensional genetic covariance matrices. In both cases, methods from random matrix theory will be essential.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1811614
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2018-07-01
Budget End
2022-06-30
Support Year
Fiscal Year
2018
Total Cost
$448,408
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305