The application of machine learning, in fields from medicine to mobile data gathering platforms, has substantial promise. Yet as data comes from a greater variety of sources in an ever-shifting world, how can one trust that machine-learned systems have not simply fit some strange idiosyncrasies they observe? This project develops methods for machine learning so that such systems are not brittle, sensitive to tiny changes in collected data, or likely to make critical mistakes on rare populations. With the growing importance of data analysis in science, industry, and healthcare, principled and practical approaches to robustness, safety, and calibration have immediate and wide-ranging effects. A major goal of the project is to provide decision makers with trustworthy predictions from machine-learned models. A second goal is pedagogical: with the meteoric rise of machine learning, there is a missed opportunity to educate students, researchers, and engineers to give them the ability to actually build trustworthy systems; this project aims toward a curriculum around such challenges.

This project develops robust learning procedures in effort to build trustable machine learning. Three concrete thrusts underpin the work. The first builds off of the investigator's work in distributional robustness, which fits models to maximize performance on populations near enough to available data. The second is to use data creatively and correctly; this entails using the data to define robustness, understand method sensitivities, use unlabeled (cheap) data to build more robust representations, and construct data-based regularization. The third targets confidence and calibration, building models that provide assumption-free valid predictions. In this case, the aim is to seek predictors with calibrated confidence, building out of conformal prediction, which modern learning methods emphatically do not provide. More generally, distributional shifts challenge statistical machine learning methods, and the project aims for new validation and testing methodologies to understand such shifts, identify situations where methods are sensitive to changes in underlying data, and to allow valid confidence in predictions even in changing environments.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2020-10-01
Budget End
2023-09-30
Support Year
Fiscal Year
2020
Total Cost
$450,000
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305