During the past five decades, Markov chain Monte Carlo (MCMC) methods have been developed as a versatile and powerful tool for scientific computing. However, as known by many researchers, conventional MCMC methods suffer from the inability to sample from distributions with intractable integrals. The goal of this project is to develop some innovative Monte Carlo algorithms which are capable of sampling from distributions with intractable integrals. To achieve this goal, the PI proposes a new population Monte Carlo algorithm---Monte Carlo dynamically weighted importance sampling (MCDWIS). In simulations, MCDWIS replaces the ratio of intractable integrals by its Monte Carlo estimate, and the bias introduced thereby is counterbalanced by giving different weights to new samples produced. MCDWIS allows for the use of Monte Carlo estimates in MCMC simulations, while leaving the target distribution invariant with respect to important weights. Unlike auxiliary variable MCMC methods, MCDWIS avoids the requirement for perfect samples, and thus can be applied to many statistical models for which perfect sampling is unavailable or very expensive. As discussed in the proposal, MCDWIS can also be used to sample from incomplete posterior distributions for missing data and random effects-related models (e.g., generalized linear mixed models), which are traditionally treated with the expectation-maximization (EM) or Monte Carlo EM algorithms. In addition to providing a fully Bayesian analysis for these models, the MCDWIS can potentially overcome, due to its self-adjusting mechanism, the local-trap problem suffered by the EM and Monte Carlo EM algorithms. In this proposal, the PI also proposes an importance sampling-targeted stochastic approximation Monte Carlo algorithm, the so-called importance stochastic approximation Monte Carlo algorithm, which can be used for Bayesian inference for the models with intractable normalizing constants.

The intellectual merit of this project is to provide some innovative computational methods, which are expected to play a major role in statistical inference for an important class of scientific models, including random graph models used in social network analysis, autonormal models used in spatial data analysis, autologistic models used in disease mapping, and generalized linear mixed models used in biomedical data analysis, among others. Successful inferences of the models will enhance people's underderstanding to the underlying natural, social, or biological systems. This project will have broader impacts in both communities of statistical methodology and scientific computing. The research results will be disseminated to these communities via direct collaboration with researchers in other disciplines, conference presentations, books, and papers to be published in academic journals. The project will have also significant impacts on education through direct involvement of graduate students in the project and incorporation of results into undergraduate and graduate courses.

Project Report

During the past several decades, Markov chain Monte Carlo (MCMC) methods have been developed as a versatile and powerful tool for scientific computing. However, as known by many researchers, conventional MCMC methods, such as the Metropolis-Hastings (MH) algorithm and the Gibbs sampler, cannot be applied to sample from distributions with intractable normalizing constants. The goal of this project is to develop some innovative Monte Carlo algorithms to tackle this problem. During the past three years, a few such a kind of Monte Carlo algorithms have been developed by the PI and his co-authors, which are described in sequel as follows. In Liang (2010), the PI proposed the so-called double Metropolis-Hastings (DMH) algorithm, where , at each iteration, an appropriate auxiliary variable is drawn via running a short Markov chain and the normalizing constants are then canceled by augmenting the auxiliary variable to the proposal distribution. This algorithm is getting more and more popular due to its simplicity and high efficiency. In Liang and Jin (2013), the authors proposed the so-called Monte Carlo Metropolis-Hastings (MCMH) algorithm, which is to approximate, at each iteration, the normalizing constant ratio in the MH acceptance probability using the samples simulated from an appropriate distribution via a short Markov chain. This algorithm has a lot of implications, as it allows a random quantity to be included in the MH acceptance probability. Based on it, the PI has recently developed some advanced Monte Carlo algorithms for big data analysis. In Jin and Liang (2013a), the authors proposed the so-called Bayesian stochastic approximation Monte Carlo (BSAMC) algorithm. This algorithm works by sampling from a sequence of approximate distributions with their average converging to the target distribution, where the approximate distributions can be achieved using the stochastic approximation Monte Carlo algorithm, a former work by the PI and his co-authors. A strong law of large numbers is established for the BSAMC estimator under mild conditions. Very recently, the PI and his co-authors proposed the so-called adaptive exchange (AEX) algorithm (Liang et al., 2013). AEX can be viewed as a MCMC extension of the exchange algorithm that was originally developed by Murray, Ghahramani and MacKay (2006, Proc. 22nd Ann. Conf. on UAI). In AEX, the auxiliary variables are generated via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of the algorithm is established under mild conditions. Compared to the exchange algorithm, AEX removes the requirement that the auxiliary variables must be drawn using a perfect sampler, and thus can be applied to many models for which the perfect sampler is not available or very expensive. In addition to the above Monte Carlo algorithms targeted at distribution sampling, the PI and his co-authors have proposed to use the varying truncation stochastic approximation MCMC algorithm for finding the maximum likelihood estimate for the distributions with intractable normalizing constants (Jin and Liang, 2013b). The proposed method has been successfully applied to the exponential random graph model, one of the most popular models used in social network analysis. The PI and his co-authors have also successfully applied the proposed algorithms to solve some real problems, such as social network modeling (Jin, Yuan, Liang, 2013) and gene-environment interaction analysis (Yu et al., 2012). This project has broader impacts in both communities of statistical methodology and scientific computing. The research results have been disseminated to these communities via direct collaboration with researchers in other disciplines and publications in academic journals. The project has also significant impacts on education through direct involvement of graduate students, such as I.K. Jin and Q. Song, in the project and incorporation of the results into a graduate course (STAT605, instructed by the PI at Texas A&M University in Spring 2011 and Spring 2012). Publications [1] Jin, I.K. and Liang, F. (2013a). Bayesian SAMC for distributions with intractable normalizing constants. Computational Statistics & Data Analysis, in press. [2] Jin, I.K. and Liang, F. (2013b). Fitting social network models using varying truncation stochastic approximation MCMC algorithm. Journal of Computational and Graphical Statistics, in press. [3] Jin, I.K., Yuan, Y. and Liang, F. (2012). Bayesian analysis for exponential random graph models using the adaptive exchange sampler. Statistics and Its Inferface, revised. [4] Liang, F. (2010). A double Metropolis-Hastings sampler for spatial models with intractable normalizing constants. Journal of Statistical Computation and Simulation, 80, 1007-1022. [5] Liang, F. and Jin, I.K. (2013). A Monte Carlo Metropolis-Hastings algorithm for sampling from distributions with intractable normalizing constants. Neural Computation, 25, 2199-2234. [6] Liang, F., Jin, I.K., Song, Q. and Liu, J.S. (2013). An Adaptive Exchange Algorithm for Sampling from Distribution with Intractable Normalizing Constants. Manuscript. [7] Yu, K., Wacholder, S., Chatterjee, N., Wheeler, W., Wang, Z., Caporaso, N., Landi, M.T., Liang, F. (2012). A flexible Bayesian model for studying gene-environment interaction. PLoS Genetics, 8(1): e1002482. doi:10.1371/journal.pgen.1002482.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1007457
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2010-08-01
Budget End
2013-07-31
Support Year
Fiscal Year
2010
Total Cost
$100,000
Indirect Cost
Name
Texas A&M Research Foundation
Department
Type
DUNS #
City
College Station
State
TX
Country
United States
Zip Code
77845