Increasingly, machine-generated language is being used in everyday interactions, from consumer applications such as intelligent assistants to accessibility tools for simplifying complex documents. This CAREER project seeks to improve methods for text generation to ensure they work consistently when producing documents longer than a few sentences. The investigator will study machine learning-based methods for text generation that aim to ensure that document structure is consistent, that people and places are referred to appropriately, and that factual information is correctly expressed. The award will additionally support the development of open-source tools to make it easier for others to use and extend these approaches for a variety of text generation application. This research will be integrated into education through new methods for university teaching of machine learning, mentoring for underrepresented students in computer science, and outreach to primary-school students who may not have been exposed to artificial intelligence and data science as an area of study.
The focus of this project is on data-driven natural language generation with deep learning. In recent years, the field has made progress on using new machine learning methods to learn to generate text based on human examples; however, this work is still far from human level particularly when generating document-level text. This project will develop new machine learning methods using deep generative models. These include learning and controlling document structure with deep hidden Markov models, content determination through generative reference and alignment, and aggregating content into text through bottom-up selection. Each of these combines research into machine learning methods with natural language analysis. The goal of the project is to develop an open-source implementation of a text generation methods that allow users to easily target new domains and assess the fidelity and coherence of their system.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.