Natural Language Generation is a fundamental component of many real-world applications, including generating captions for visually-impaired people. The CAREER project will seek to develop new methods for generating higher quality and more reliable sentences, and ensure better and more faithful generation results. The project will also lead to open-source software and tools that facilitate the diagnosis of neural generation models, and provide resources for building the next generation faithful language generation models. The investigator will integrate research with educational components, and enable underrepresented high school students to access Artificial Intelligence and Natural Language Processing research and course materials.

A major challenge that prevents deep learning based natural language generation models in practical deployment is faithfulness. For example, in the task of image captioning, when using sequence-to-sequence models for generation, it often leads to the “hallucination” phenomenon: an object that does not belong to the context might be generated in the text. Similarly, in the task of data-to-text generation (e.g., generating a Wikipedia biography from structured data) problem, deep learning models are prone to generate erroneous entities and attributes that do not belong to the input data. These behaviors significantly downgrade the performance of neural generative models, and the faithfulness of the output becomes a significant issue for building the next generation faithful natural language generation engines. This project will investigate the complex relationships between uncertainty and faithfulness at various levels. And several mitigation strategies will also be considered. An interactive agent will be built to reason in user-generated text to understand the faithfulness constraint. The goal of this project is to deeply understand how to quantify and access faithfulness in robust settings, and build useful open-source software that facilitates this purpose.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
2048122
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2021-04-01
Budget End
2026-03-31
Support Year
Fiscal Year
2020
Total Cost
$98,599
Indirect Cost
Name
University of California Santa Barbara
Department
Type
DUNS #
City
Santa Barbara
State
CA
Country
United States
Zip Code
93106