This project is developing methods that allow a computer to automatically learn to understand and generate instructions in human language. Traditional approaches to natural-language learning require linguistic experts to laboriously annotate large numbers of sentences with detailed information about their grammar and meaning. In this project, instructional language is initially learned by simply observing humans following instructions given by other humans. Once the system has learned reasonably well from observation, it also actively participates in the learning process by following human-given instructions itself, or giving its own instructions to humans and observing their behavior. The approach is being evaluated on its ability to interpret and generate English instructions for navigating in a virtual environment (e.g. "Go down the hall and turn left after you pass the chair."). A novel machine learning method infers a probable formal meaning for a sentence from the resulting actions performed by a human follower, and then existing language-learning methods are used to acquire a language interpreter and generator. The learned system is being evaluated in a range of virtual environments, testing its ability to follow human-provided natural language instructions to achieve prescribed goals, as well as to generate natural language instructions that humans can successfully follow to find specific destinations. The methods developed for this project will contribute to the development of virtual agents in games and educational simulations that learn to interpret and generate English instructions, and eventually aid the development of robots that can learn to interpret human language instruction from observation.

Project Report

Intellectual Merit: This project explored grounded language learning by computers. Most research in natural-language processing attempts to comprehend text in isolation; however, fully understanding human language requires capturing the relationship between language and the world. Grounded language learning attempts to acquire knowledge that connects language to perception and action. This project primarily explored grounded language learning for the task of direction following in virtual worlds. It resulted in the development a series of increasingly capable computer systems for learning to follow natural-language navigation instructions by simply observing humans following such instructions in a virtual environment. Extensive experiments demonstrated the ability of these approaches to learn to follow natural-language instructions in two diverse languages (English and Mandarin Chinese) from such weak, ambiguous supervision with no prior linguistic knowledge. This grant also partially supported additional gounded-language research on video description, including using text-mining to aid activity recognition and generate simple sentences for describing single-activity videos. Experimental results on actual short YouTube videos demonstrated the system's abiltiy to accurately describe videos with descriptive English sentences and to improve its descriptions through the use of linguistic knowledge automatically acquired from large text corpora.The project produced 8 major scientific conference papers, and two PhD theses. Broader Impacts: The developed methods will aid the development of virtual agents and robots that automatically learn to accept human instruction in natural language. The methods developed for describing videos in natural-language will aid the development of systems for video search, descriptive video services for the visually imparied, and automated surveillance. Navigation instruction data for the project has been made publicly available and other researchers have already used it to develop and evaluate grounded learning for instructional language. The system has supported the education of two PhD and four Masters students in Computer Science in the high-demand areas of natural-language processing and machine learning. These graduates are now working in the U.S. technology industry for Google, Microsoft and several Internet start-ups.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1016312
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2010-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2010
Total Cost
$450,000
Indirect Cost
Name
University of Texas Austin
Department
Type
DUNS #
City
Austin
State
TX
Country
United States
Zip Code
78759