CAREER: Bayesian Nonparametric Learning for Large-Scale Structure Discovery

Sudderth, Erik

Abstract

This CAREER project will advance the state-of-the-art for automated discovery of structure within data as diverse as images and video, natural language, audio sequences, and social and biological networks. Contemporary applications of statistical machine learning are dominated by parametric models. This approach constructs models of pre-determined size (with a finite-dimensional vector of parameters which) are tuned using training data. To be effective, the underlying structure of such models must be manually specified by experts with application-specific knowledge. This presumed structure imposes limits on what can possibly be learned even from very big datasets.

Bayesian nonparametric models instead define distributions on models of arbitrary size with infinite-dimensional spaces of functions, partitions, or other combinatorial structures. They lead to flexible, data-driven unsupervised learning algorithms, and models whose internal structure continually grows and adapts to new observations. Bayesian nonparametric models, while promising, are an incompletely-developed technology posing significant challenges to practice. This CAREER project will increase the practical feasibility and impact of Bayesian nonparametric approaches by pursuing three interrelated themes:

1) Nonparametric Model Design and Evaluation. New families of models for data with hierarchical, spatial, temporal, or relational structure are investigated. Quantitative validation of the statistical assumptions and biases inherent in these models will be emphasized, evaluating whether these align with the empirical statistics of significant application areas.

2) Reliable Structure Discovery. Statistical inference algorithms which move beyond the local moves of standard (and widely used) Monte Carlo and variational methods will be developed. Compelling examples indicate that local optima are a significant issue for contemporary methods, so a family of novel algorithms is proposed, which dynamically adjust model complexity as learning proceeds.

3) Scalable and Extensible Nonparametric Learning. Common patterns across a wide range of popular nonparametric models are identified, which suggest a corresponding family of scalable and parallelizable online learning algorithms. The "memoized" online variational inference algorithm avoids some practical instabilities and sensitivities of conventional methods, while allowing provably correct optimization of the nonparametric model structure and complexity.

An extensible "BNPy: Bayesian Nonparametric Learning in Python" software package is under development to allow easy application of the novel learning algorithms to a wide range of current and future BNP models. The education and outreach plan of this CAREER project leverages this software to create interdisciplinary undergraduate research teams exploring applications in the natural and social sciences, and a week-long summer school on Bayesian nonparametrics to be held twice at Brown University's Institute for Computational and Experimental Research in Mathematics (ICERM).

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1758028
Program Officer: Rebecca Hwa

Project Start
Project End
Budget Start: 2017-09-01
Budget End: 2021-02-28
Support Year
Fiscal Year: 2017
Total Cost: $331,349
Indirect Cost

CAREER: Bayesian Nonparametric Learning for Large-Scale Structure Discovery
Sudderth, Erik
University of California Irvine, Irvine, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments