Methods directly applicable to the determination of regulatory cell mechanisms associated with vaccine responses will be developed as an integral part of this project. Such mechanisms are not directly observable or measurable, but a wealth of indirect measurements can be obtained, for instance on antibody titer-driven effects. New methodology, that meets the challenge of extracting these hidden signals from a very large volume of data, consisting of a diverse arrays of indirect measurements, are developed and put to test immediately, for emerging pandemics data sets. More broadly, techniques that allow for reliable, mathematically grounded, inference and prediction from essential, but hidden, signatures will be developed and applied to data sets arising from neuroscience and high-throughput text data.

A foundational study of inference and prediction from high-dimensional data with low-dimensional embeddings modeled by factor models are undertaken in this project. Within new classes of identifiable factor regression models in which the latent factors are interpretable, new scalable methods for estimating the regression coefficients of the latent factors will be developed. Optimality of estimation from a finite-sample, minimax perspective, as well as the derivation of the asymptotic limit of the estimates, and especially of their efficient asymptotic variance will be a cornerstone of research under this project. Prediction from high-dimensional dependent features, with reduced-effective-rank covariance matrix, will be analyzed under generic factor regression models. In particular, interpolating predictors, popular in deep-learning, will be contrasted with other contenders, with the aim of offering fundamental understanding of model-free versus model-based prediction, when data arises from a factor regression model. Sparse topic models, together with inference and prediction from the hidden topics, will be studied as companion models for data in which all the features are discrete. Applications of the newly developed, scalable and theoretically founded methods will constitute a focal point of this project.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
2015195
Program Officer
Pena Edsel
Project Start
Project End
Budget Start
2020-07-01
Budget End
2023-06-30
Support Year
Fiscal Year
2020
Total Cost
$300,000
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850