Structured, high-dimensional regression problems is a current topic of major interest due recent applications in modern scientific fields. Domain experts throughout the sciences (e.g., quantum physicists or data-scientists in bioinformatics) are now equipped with powerful regularized algorithms that leverage the structures hidden in this high-dimensional data. This regularization enables prediction and estimation of unknowns where classical statistical methods are inefficient. On the other hand, these newly developed regularized algorithms often lack automatic inference capabilities in the form of confidence intervals. The project's goal is to develop automatic confidence intervals on top of differentiable regularized estimators, with no or little constraint on the type of regularization used. This goal is algorithmic-centric: Develop methodologies and software that, given a differentiable regularized estimator chosen and favored by a domain expert, empower that estimator with confidence intervals comparable to those available in classical statistics. The training component includes graduate and undergraduate course work, graduate student mentorship, and participation of the Rutgers REU program.

The common regularization techniques that leverage structures in high-dimensional data incur a bias that is incompatible with the confidence intervals provided by classical statistical theory. Given a regularized estimator, the goal of the statistician is to provide a valid confidence interval, for instance by removing this bias and standardizing the variance. Such inference scheme is currently only possible for a few specific regularized estimators, and one goal of the project is to extend such capability to almost any differentiable estimator. The motivation is that, if the estimator of interest is a differential function of the data, gradients of that estimator with respect to the observed data provide rich information that can be leveraged to construct valid confidence intervals in high-dimensional models where the literature has, so far, focused on prediction. The challenges related to this approach span Statistics, Probability Theory, and Computer Science: The project aims to develop new, flexible asymptotic normality results for provable Type I error control, to understand new relationships between gradients and bias and between gradients and variance, to explain the role of degrees-of-freedom or their proxies in high-dimensions, and to develop algorithms and software that efficiently compute the gradients required for inference. This Post-Differentiation Inference approach would empower Domain Experts throughout the sciences by providing uncertainty quantification on top of field-specific estimators, for instance in quantum or genomic applications.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1945428
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2020-06-01
Budget End
2025-05-31
Support Year
Fiscal Year
2019
Total Cost
$61,587
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
Piscataway
State
NJ
Country
United States
Zip Code
08854