Deploying general purpose robots on a wide scale ranging from the home to the workplace requires a more sustainable model to quickly and robustly train them to perform novel tasks in unknown environments without the intervention of robotics experts. Toward this goal, various approaches have been explored to allow an ordinary human user to train a robot using various forms of instruction and interaction, specifically by providing evaluative feedback while a robot is learning to perform a task, or by explicitly demonstrating how to perform the task. When a person is providing feedback or demonstrating a task for another human, they typically describe what they are doing in natural language, providing context, clarification, and/or explanations for their evaluations or actions. Therefore, this project focuses on developing new computational methods that will enable robots to more efficiently and robustly learn from feedback and demonstration by leveraging accompanying natural language narration as context.

The project develops two new approaches to using language to aid interactive task learning by integrating ideas from language grounding, explanation for deep learning, and learning from rationales. The first approach uses language narration as a form of "supervised attention" that focuses learning on relevant features of the environment, thereby allowing effective learning from limited training data. First, the system learns to ground natural language in the robot's perceptions, utilizing prior work on automated video captioning and multi-modal linguistic grounding. Next, human linguistic narration is translated to a saliency map over the perceptual field using recent methods for visually explaining the processing of the resulting language-grounding networks. Finally, this saliency map is used to supervise the attention mechanism of a deep-reinforcement learning system that learns from feedback and/or demonstration, allowing it to learn faster and more effectively from limited interaction. The second approach uses natural language narrations to perform reward shaping. In this approach, natural language instructions are mapped to intermediate rewards, which can be seamlessly integrated into any standard reinforcement learning algorithm, again improving the speed and accuracy of learning. Both of these approaches are experimentally evaluated by using them to learn new tasks and quantitatively comparing the speed and effectiveness of learning with and without linguistic narration. The hypothesis is that the use of linguistic narration will improve the speed and effectiveness of learning. Tasks will include simulated ones employing video games typically used to evaluate reinforcement learning and real-world robot tasks involving navigation and object manipulation.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2019-09-01
Budget End
2022-08-31
Support Year
Fiscal Year
2019
Total Cost
$749,411
Indirect Cost
Name
University of Texas Austin
Department
Type
DUNS #
City
Austin
State
TX
Country
United States
Zip Code
78759