Modern computational capabilities, modern theory, and the expanded data sets produced by modern scientific equipment have greatly increased the scope of statistical inference. This research project investigates a set of questions in probability and statistics raised by large-scale data collection. While of increasing use, empirical Bayes methods have proved difficult to justify. The approach under development in this project brings a novel application of exponential family theory to the job, with the goal of clarifying how empirical Bayes analyses converge to traditional Bayes methods as sample sizes increase. The project aims to develop empirical Bayes methods that use large-scale parallel data sets, such as those from microarray studies, to improve estimation in situations reporting many small sub-experiments, each of which by itself has low accuracy; and improved Monte Carlo methods for the computer solution of massive optimization problems.

Specific topics under investigation in this research project include large-scale empirical Bayes strategies, importance sampling for computer-assisted inference in formerly intractable situations, and a theory of stability assessment for traditional methods of accuracy estimation. Exponential families of probability distributions play a central role in both computation and statistical inference. A particularly stubborn impediment to their use in massive data analyses is the lack of a suitable norming constant in the exponential family density. This research project further develops computational methods based on solutions of appropriate variational problems; a promising application is in the area of graphical models. A second application of exponential families involves efficient deconvolution of datasets to obtain empirical Bayes estimates.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1608182
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2016-06-01
Budget End
2020-05-31
Support Year
Fiscal Year
2016
Total Cost
$700,000
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305