CAREER: Large-Scale Markov Chain Monte Carlo for Reliable Machine Learning

De Sa, Christopher

Abstract

A core capability of intelligence is reasoning about hidden information. Many artificial intelligence (AI) approaches reason about hidden information by constructing a statistical model and then running a statistical inference algorithm to learn hidden information from observed data. But many inference algorithms take a very long time to run when they are learning from a very large amount of data; or, worse, they might run quickly but give the wrong answer. This is problematic as the world trends towards large-scale AI. This project will build new general statistical inference algorithms that will still run efficiently, even on very large datasets and on very complicated models, while having provable reliability guarantees. This will promote the progress of science by making scalable statistical inference reliable. The project will also further education in AI through the development of open-source course resources that give students hands-on experience with how scalability and reliability interact in ML systems.

The project will focus on Markov chain Monte Carlo (MCMC) methods, which is a class of statistical inference algorithm that work by simulating a random process that converges to a desired statistical model. Markov chain Monte Carlo methods can give very accurate statistical estimates, but can scale poorly to large datasets and complicated models. This project will fix this by building new algorithms that address scaling to large data and large models with data-subsampling and asynchronous parallelism, respectively. Throughout, it will focus on proving theoretical guarantees that expose the trade-off between scalability and reliability for MCMC.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 2046760
Program Officer: Rebecca Hwa

Project Start
Project End
Budget Start: 2021-03-15
Budget End: 2026-02-28
Support Year
Fiscal Year: 2020
Total Cost: $81,332
Indirect Cost

CAREER: Large-Scale Markov Chain Monte Carlo for Reliable Machine Learning
De Sa, Christopher
Cornell University, Ithaca, NY, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments