This project promotes the development of energy-efficient and linguistically-motivated computational methods for understanding human language. Recent advances in text-based tasks such as machine translation and question answering have been fueled by training huge-scale neural network models on billions of words. While this brute-force approach has no doubt been successful, it also has many downsides. As the computational requirements for training and using these models grow larger and larger, their carbon footprints have been steadily increasing, and their accessibility has become limited to those at a few well-funded companies and institutions. These models do not explicitly consider the hierarchical nature of language, a well-studied phenomenon in linguistics, which the investigators believe contributes to their overall inefficiency and also reduces their interpretability to end users. The technologies developed in this project aim not only to make computational models of language more accessible and efficient, but also to improve the state of the art in text generation tasks such as translation and summarization. The project integrates the newly-developed methods into academic settings to provide significant outreach to undergraduates outside of computer science as well as in underrepresented communities.
To develop this new methodology, the project introduces neural architectures that induce syntactic and semantically-relevant tree structures from raw text while simultaneously learning powerful vector-based representations that improve downstream tasks. These models combine insights from self-supervised learning, which allows for powerful representation learning without expensive manual effort, with a tree-shaped structural bias. The resulting methods are evaluated with respect to three major goals: (1) enabling representation learning of the entire linguistic hierarchy (i.e., words, phrases, sentences, and discourse-level units) within a single architecture; (2) improving computational and energy efficiency of training and inference; and (3) improving long-form text generation tasks including document-level translation and text summarization. This research effort aims to spur research into sustainable and scalable language representation learning, and as such its outputs include publicly-released pretrained models and open-sourced code.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.