When a posterior distribution has multiple modes, unconditional expectations, such as the posterior mean, may not offer informative summaries of the distribution. Motivated by this problem, the investigator proposes to develop Markov chain Monte Carlo (MCMC) methods that may generate sufficient samples from the domain of attraction of every major mode and therefore construct estimates for the probability mass of and conditional expectations given a domain. Computational methods will be developed to build the landscape of a distribution based on an MCMC sample. This project will contribute novel methodologies on MCMC and Bayesian inference with multimodal posterior distributions, and generalize theory on adaptive Markov chains. A new algorithm, based on the framework of the multi-domain sampler, will be developed to group dynamically domains separated by low barriers and to construct the tree of sublevel sets for a distribution. The tree includes local modes as terminal nodes and barriers as internal nodes. This project also develops Bayesian inference methods via domain-based estimation and algorithms to quantify the stability of a posterior mode and its domain of attraction, with applications in Bayesian missing data problems and structure estimation. Convergence and ergodicity of the multi-domain sampler with global moves will be studied under the framework of doubly adaptive MCMC. A theoretical model, based on the tree of sublevel sets, will be developed to facilitate convergence and efficiency analysis of MCMC algorithms.
Scientific problems in many disciplines may be solved by sampling from a given probability distribution. Monte Carlo methods, Markov chain Monte Carlo in particular, are a class of stochastic simulation algorithms that may draw samples from almost any distribution. However, these algorithms suffer from low efficiency when the distribution has multiple local modes. Therefore, the first significance of the proposed project comes from its applicability to many problems in various scientific fields, including statistical physics, chemical physics, and computational biology. On the other hand, there are almost no existing methods that can extract useful information about a multimodal distribution from Monte Carlo samples. The proposed project includes a systematic development of computational methods for constructing novel and comprehensive summaries about a multimodal distribution via a unified graphical representation for the landscape of a distribution. This can greatly enhance the current understanding of many problems in statistics and machine learning by, for example, quantifying the difficulty of a problem and providing visualization of a high-dimensional objective function.