Modern massive data appear in increasing volume and high heterogeneity. Examples include internet searches, social networks, mobile devices, satellites, genomics, medical scans, etc. Bayesian approaches are particularly useful in such context since the complex structures in the data can be naturally incorporated in Bayesian hierarchical models. Besides, uncertainty quantification can be easily executed through Bayesian computation. However, due to storage and computational bottlenecks, traditional Bayesian computation implemented in a single machine is no longer applicable to modern massive data. In this project, a set of nonparametric Bayesian aggregation procedures with theoretical justifications are developed based on a standard parallel computing strategy known as Divide-and-Conquer. This research will significantly enhance the availability of Bayesian tools and software for analyzing massive data. The educational plan of the project will be in the form of graduate student advising and offering of special topics courses.

This project consists of three major components. First, the PIs will establish a Gaussian approximation of general nonparametric posterior distributions which serves as a theoretical foundation for general distributed Bayesian algorithms. Second, the PIs will develop a nonparametric Bayesian aggregation procedure with theoretical guarantees that is particularly useful to handle massive data in a parallel fashion. Third, the PIs will develop an efficient parallel Markov Chain Monte Carlo (MCMC) algorithm for nonparametric Bayesian models which will perform as well as traditional MCMC with substantially less computational costs. This research will lead to an emergence of "Splitotics (Split+Asymptotics) Theory" providing theoretical guidelines for Bayesian practices. The smoothing spline inference results recently obtained by the PIs will be used as a promising tool for achieving the above goals.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1712907
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2017-09-01
Budget End
2020-08-31
Support Year
Fiscal Year
2017
Total Cost
$140,000
Indirect Cost
Name
Purdue University
Department
Type
DUNS #
City
West Lafayette
State
IN
Country
United States
Zip Code
47907