FAI: Quantifying and Mitigating Disparities in Language Technologies

Neubig, Graham; Bigham, Jeffrey; Kaufman, Geoff; Tsvetkov, Yulia; Anastasopoulos, Antonios

Abstract

Advances in natural language processing (NLP) technology now make it possible to perform many tasks through natural language or over natural language data -- automatic systems can answer questions, perform web search, or command our computers to perform specific tasks. However, ``language'' is not monolithic; people vary in the language they speak, the dialect they use, the relative ease with which they produce language, or the words they choose with which to express themselves. In benchmarking of NLP systems however, this linguistic variety is generally unattested. Most commonly tasks are formulated using canonical American English, designed with little regard for whether systems will work on language of any other variety. In this work we ask a simple question: can we measure the extent to which the diversity of language that we use affects the quality of results that we can expect from language technology systems? This will allow for the development and deployment of fair accuracy measures for a variety of tasks regarding language technology, encouraging advances in the state of the art in these technologies to focus on all, not just a select few.

Specifically, this work focuses on four aspects of this overall research question. First, we will develop a general-purpose methodology for quantifying how well particular language technologies work across many varieties of language. Measures over multiple speakers or demographics are combined to benchmarks that can drive progress in development of fair metrics for language systems, tailored to the specific needs of design teams. Second, we will move beyond simple accuracy measures, and directly quantify the effect that the accuracy of systems has on users in terms of relative utility derived from using the system. These measures of utility will be incorporated in our metrics for system success. Third, we focus on the language produced by people from varying demographic groups, predicting system accuracies from demographics. Finally, we will examine novel methods for robust learning of NLP systems across language or dialectal boundaries, and examine the effect that these methods have on increasing accuracy for all users.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 2040926
Program Officer: Wei Ding

Project Start
Project End
Budget Start: 2021-10-01
Budget End: 2024-09-30
Support Year
Fiscal Year: 2020
Total Cost: $375,000
Indirect Cost

FAI: Quantifying and Mitigating Disparities in Language Technologies
Neubig, Graham Bigham, Jeffrey Kaufman, Geoff Tsvetkov, Yulia Anastasopoulos, Antonios
Carnegie-Mellon University, Pittsburgh, PA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments