Distributional optimization refers to a class of mathematical problems where the optimizing variable in the objective function is a probability measure over some space. Because of its highly technical nature, distributional optimization has remained largely unexplored with advances made only on specific problem instances. This project proposes a unified framework to explore challenging distributional optimization problems in a wide range of important application domains. The main objective is to develop a comprehensive theory supporting the principled design of novel and efficient optimization algorithms. To do so, establishing connections between several mathematical disciplines will be required, including optimization theory, optimal transport, functional inequalities, and statistics. This will promote the cross-fertilization of ideas and lead to the creation of training material from an interdisciplinary perspective. The resulting open-source packages will be made available to support research efforts in related fields that our daily lives depend on.
Problems in distributional optimization are infinite-dimensional optimization problems where the optimization variable is a probability measure. Many research problems fall into this class of problems; in particular, any non-convex optimization problem over Euclidean space can be cast as a convex distributional optimization problem. Traditionally, specific instances of these problems have been studied independently of each other. Formulating these seemingly different optimization problems into a single unified framework will allow more powerful mathematical techniques and tools to be used. This will lead to deeper insights into the structure of solutions, and to efficient algorithms tailored to large-scale applications in artificial intelligence and data science. The proposed framework is based on optimal transport theory that endows the space of distributions with a natural geometry. The proposed algorithm utilizes the gradient flow of the objective with respect to this geometry. To achieve scalability, the optimization variable is approximated by a collection of particles, with the algorithm now describing the collective dynamics of the particles. A novel variational approach will be used to approximate the gradient descent direction. The theoretical properties of this algorithm will be investigated thoroughly, including provable performance guarantees, convergence rates, and statistical properties. Case studies will be carried out by specializing this unified framework to applications such as Bayesian inference and distributionally robust learning. The acceleration of this algorithm will also be investigated by incorporating existing optimization techniques, such as momentum and variance reduction, as a way to improve convergence rates. Finally, the project will explore how to adapt this algorithm from minimization problems to min-max problems in order to deal with game-theoretic applications.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.