CAREER: Detecting, Understanding, and Fixing Vulnerabilities in Natural Language Processing Models

Singh, Sameer

Abstract

With recent advances in machine learning, models have achieved high accuracy on many challenging tasks in natural language processing (NLP) such as question answering, machine translation, and dialog agents, sometimes coming close to or beating human performance on these benchmarks. However, these NLP models often suffer from brittleness in many different ways: they latch onto erroneous artifacts, do not support natural variations in language, are not robust to adversarial attacks, and only work on a few domains. Existing pipelines for developing NLP models lack support for useful insights, and identifying bugs requires considerable effort from experts both in machine learning and the domain. This CAREER project develops several techniques to support this need for more robust training and evaluation pipelines for NLP, providing easy-to-use, scalable, and accurate mechanisms for identifying, understanding, and addressing NLP models' vulnerabilities. The developed methods will support diverse application areas such as conversational agents, sentiment classifiers, and abuse/hate speech detection. Further, the team engages with the developers of NLP models in academia and industry to develop a data science curriculum for K-12 education, particularly for students from underrepresented communities.

Based on the notion of vulnerability as unexpected behavior on certain input transformations, the team will contribute across the following three thrusts. The first thrust identifies vulnerabilities by testing user-defined behaviors and searching over many possible vulnerabilities. In the second thrust, the investigators develop methods to understand the model's vulnerabilities by tracing the causes of errors to individual training data points and data artifacts. The last thrust will develop approaches to address vulnerabilities in models by directly injecting the vulnerability definitions into the model during training and using explanation-based annotations to supervise the models. These thrusts build upon the goals of behavioral testing, explanation-based interactions, and architecture agnosticism to support most current and future NLP models and applications.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 2046873
Program Officer: D. Langendoen

Project Start
Project End
Budget Start: 2021-07-01
Budget End: 2026-06-30
Support Year
Fiscal Year: 2020
Total Cost: $88,376
Indirect Cost

CAREER: Detecting, Understanding, and Fixing Vulnerabilities in Natural Language Processing Models
Singh, Sameer
University of California Irvine, Irvine, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments