Statistical analysis of large datasets and complex models is in tremendous demand, with applications throughout the natural and social sciences, engineering, business, and biomedicine. A major limitation in meeting this demand lies in current computational algorithms, which fail to scale adequately to large problems and data sets. At the same time, advances in computer processor speeds have slowed dramatically, prompting a shift in the computer hardware industry towards parallelization. As a result, the demands on Bayesian computational algorithms are increasing rapidly as the platforms underlying them are changing. This work explores several promising new directions for producing highly efficient and scalable algorithms, and corresponding software tools, suitable for general-purpose Bayesian calculations.
The work involves new directions in Bayesian computation, including (1) true parallelization of general-purpose Markov chain Monte Carlo samplers, a class of algorithms traditionally viewed as "inherently serial," (2) algorithmic and theoretical advances in "static data" particle-based sequential Monte Carlo samplers which are highly parallelizable but currently fail on many complex high-dimensional posterior distributions, and (3) new tools for empirical monitoring of convergence of Monte Carlo samplers specifically designed for complex distributions and high-dimensional problem domains. All facets of the work are directly motivated by applications of Bayesian statistics in chemical kinetics, structural bioinformatics, and systems biology. This work also has immediate applicability to problems in broader areas of statistical physics, computer science, and molecular simulation.