In recent years, machine learning (ML) and artificial intelligence (AI) have achieved remarkable progress. Smart computer programs can now categorize images better than humans, beat the world champion at Go, and make intelligent recommendations in areas from health care to education. Under the hood, many of these technologies are made possible by the idea of using highly flexible and structured probabilistic models to express and reason with complex phenomena. Meanwhile, probabilistic models required for modern machine learning systems are becoming increasingly complex, and the ability to compute probabilities efficiently becomes one of the main bottlenecks of modern learning systems. The goal of this project is to develop a new theoretical and algorithmic framework of efficient and approximate computation of probabilities for highly complex probabilistic models. This project provides research opportunities for undergraduates and it also develops educational modules for outreach activities to high school students and undergraduates.
Markov chain Monte Carlo (MCMC) and variational inference (VI) have been the two major types of approximate inference algorithms that dominate the literature. However, both of them have their own critical weaknesses. MCMC is accurate but suffers from slow convergence; VI is typically faster but introduces deterministic errors and lacks theoretical guarantees. This project aims to introduce a new Stein variational paradigm for approximate inference that integrates the advantages of MCMC and VI, enabling algorithms that are as flexible and accurate as MCMC and as fast as VI. The key idea is to directly optimize a non-parametric particle-based representation to fit intractable distributions with fast deterministic gradient-based updates, which is made possible by integrating and generalizing key mathematical tools from Stein's method, optimal transport and interacting particle systems. A basic algorithm derived from this framework, called Stein variational gradient descent (SVGD), has already been found to be a powerful tool in a range of applications. This project extends this initial success to a higher level, by (1) systematically investigating basic theoretical problems, (2) developing more efficient and practical algorithms and software, and (3) demonstrating its power in various interdisciplinary applications, including reinforcement learning and molecule dynamics.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.