Increasingly, machine-generated language is being used in everyday interactions, from consumer applications such as intelligent assistants to accessibility tools for simplifying complex documents. This CAREER project seeks to improve methods for text generation to ensure they work consistently when producing documents longer than a few sentences. The investigator will study machine learning-based methods for text generation that aim to ensure that document structure is consistent, that people and places are referred to appropriately, and that factual information is correctly expressed. The award will additionally support the development of open-source tools to make it easier for others to use and extend these approaches for a variety of text generation application. This research will be integrated into education through new methods for university teaching of machine learning, mentoring for underrepresented students in computer science, and outreach to primary-school students who may not have been exposed to artificial intelligence and data science as an area of study.

The focus of this project is on data-driven natural language generation with deep learning. In recent years, the field has made progress on using new machine learning methods to learn to generate text based on human examples; however, this work is still far from human level particularly when generating document-level text. This project will develop new machine learning methods using deep generative models. These include learning and controlling document structure with deep hidden Markov models, content determination through generative reference and alignment, and aggregating content into text through bottom-up selection. Each of these combines research into machine learning methods with natural language analysis. The goal of the project is to develop an open-source implementation of a text generation methods that allow users to easily target new domains and assess the fidelity and coherence of their system.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
2037519
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2019-07-01
Budget End
2024-01-31
Support Year
Fiscal Year
2020
Total Cost
$200,000
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850