Data sets arising from current applications of statistics and machine learning are of very large size and require large models for their analysis. Bayesian inference and global optimization are two powerful methods for learning from such data, but the large size of the data sets and the resulting computational difficulties greatly limit the applicability of these methods. The research in this project aims to increase computational efficiency of these methods, thereby substantially expanding their usefulness for the analysis of large data sets. The methods and algorithms from this research will be implemented on modern distributed computing platforms and made freely available for the scientific community. The results will have wide applications in statistics and machine learning.

Specifically, the use of mini-batches in Markov Chain Monte Carlo (MCMC) will be investigated. MCMC is perhaps the most widely used computational approach for Bayesian statistical inference. Since each step in the simulation of the Markov chain requires the scanning of all the observations, for a large data set this computation is prohibitive. On the other hand, in the area of machine learning researchers have found that stochastic optimization techniques, which examine only a mini-batch of data points at a time, can deliver excellent performance. In this project, a framework for unifying mini-batch based MCMC and global optimization will be developed. It is showed that simulation from of a tempered version of the posterior distribution can be approximated by a MCMC process with Metropolis-Hasting updates that depend only on mini-batches. This approach will be combined with eqi-energy sampling to achieve a unified simulation and global optimization methodology. This framework will allow us to improve the performance of both MCMC methods and non-convex global optimization methods.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1811920
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2018-08-01
Budget End
2021-07-31
Support Year
Fiscal Year
2018
Total Cost
$200,000
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305