Despite the successes of machine learning at complex prediction and classification tasks (such as which add a reader will click? or which word a speaker pronounced?), there is growing evidence that "state-of-the-art" predictors can perform significantly less accurately on minority populations than on the majority population. Indeed, a notable study of three commercial face recognition systems, known as the "Gender Shades" project demonstrated significant performance gaps across different subpopulations at natural classification tasks. Systematic errors on underrepresented subpopulations limit the overall utility of machine-learned prediction systems and may cause material harm to individuals from minority groups. To address accuracy disparity and systematic biases throughout machine learning, the project pursue a principled study of learning in the presence of diverse populations. The project puts high value on education, service to the research community, and wide dissemination of knowledge. The research activities will be accompanied by and integrated with curriculum development, research advising (for students at all levels), service, and outreach to other scientific communities and in popular writing. In addition, in the age of machine-learning and big data, the project's societal impact is twofold: making sure that algorithms work for everyone but also making sure algorithms uncover all potential talent, which exists in all communities.

The project combines theoretical and empirical investigations to develop algorithmic tools for mitigating systematic bias across subpopulations and to answer basic scientific questions about why discrepancy in accuracy across subpopulations emerges in the first place. Specifically, the project aims to ask and resolve questions that arise in the context of learning from diverse populations along three main axes: (1) Improving predictions for underrepresented populations: Can learning algorithms be developed that provably do not overlook significant subpopulations, (2) Representing individuals to improve the ability to audit and repair models, (3) Understanding the causes for biases in machine common learning models and algorithms.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1908774
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2019-10-01
Budget End
2022-09-30
Support Year
Fiscal Year
2019
Total Cost
$500,000
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305