Modern computational capabilities, modern theory, and the expanded data sets produced by modern scientific equipment have greatly increased the scope of statistical inference. This research project investigates a set of questions in probability and statistics raised by large-scale data collection. While of increasing use, empirical Bayes methods have proved difficult to justify. The approach under development in this project brings a novel application of exponential family theory to the job, with the goal of clarifying how empirical Bayes analyses converge to traditional Bayes methods as sample sizes increase. The project aims to develop empirical Bayes methods that use large-scale parallel data sets, such as those from microarray studies, to improve estimation in situations reporting many small sub-experiments, each of which by itself has low accuracy; and improved Monte Carlo methods for the computer solution of massive optimization problems.
Specific topics under investigation in this research project include large-scale empirical Bayes strategies, importance sampling for computer-assisted inference in formerly intractable situations, and a theory of stability assessment for traditional methods of accuracy estimation. Exponential families of probability distributions play a central role in both computation and statistical inference. A particularly stubborn impediment to their use in massive data analyses is the lack of a suitable norming constant in the exponential family density. This research project further develops computational methods based on solutions of appropriate variational problems; a promising application is in the area of graphical models. A second application of exponential families involves efficient deconvolution of datasets to obtain empirical Bayes estimates.