This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
In this project, the investigator develops computational methods for Bayesian optimal sequential design for the estimation of random functions. Random function estimation, either in the context of Bayesian nonparametric regression or in the analysis of spatial/spatio-temporal processes, has become an ubiquitous tool in most areas of science. While methods for the estimation of random functions are reasonably well developed, optimal design for such problems is in its infancy. This proposal presents a research program that develops a novel computational framework for Bayesian optimal sequential design for random function estimation. This computational framework is based on evolutionary Markov chain Monte Carlo (EMCMC), which combines ideas of genetic or evolutionary algorithms with the power of Markov chain Monte Carlo. This framework is able to consider general models for the observations, such as generalized linear models and scale mixtures of normals. In addition, this methodology easily accommodates multiple covariates and general priors on the space of regression functions based on basis functions such as splines and Gaussian kernels. Finally, this framework allows optimality criteria with general utility functions that may include competing objectives, such as for example minimization of costs, minimization of the distance between true and estimated functions, and minimization of the prediction error.
Estimation of random functions arises in many application areas, such as for example environmental science, epidemiology, climatology, and engineering. An important example in engineering is the statistical approximation of computer model output, e.g., approximation of fluid flow simulators and rocket booster simulators. Usually, scientists want to run such simulators for many different experimental conditions. However, typically each run of a computer model is extremely expensive and time consuming. An effective way to deal with these resource constraints is to run the simulator for a relatively small number of experimental conditions and to fit a statistical nonparametric model to approximate the output of the simulator. The proposed computational methods allow Bayesian optimal choice of experimental conditions in a sequential fashion, that is, the next experimental conditions are chosen based on what has been learned from the previous experimental conditions. Thus, the proposed methodology for sequential choice of design points will result in huge improvements in cost and efficiency. Another important application of the proposed Bayesian methodology is in dynamic monitoring of spatio-temporal environmental processes. The statistical design problem is to decide where to locate the several stations of a monitoring network. In case some of the monitoring stations are mobile, the proposed methodology leads to an optimally adaptive monitoring network which keeps costs under control while maximizes learning.