Humans naturally use dialog and gestures to discuss complex phenomena and plans, especially when they refer to physical aspects of the environment while they communicate with each other. Existing robot vision systems can sense people and the environment, but are limited in their ability to detect the detailed conversational cues people often rely upon (such as head pose, eye gaze, and body gestures), and to exploit those cues in multimodal conversational dialog. Recent advances in computer vision have made it possible to track such detailed cues. Robots can use passive measures to sense the presence of people, estimate their focus of attention and body pose, and to recognize human gestures and identify physical references. But they have had limited means of integrating such information into models of natural language; heretofore, they have used dialog models for specific domains and/or were limited to one-on-one interaction. Separately, recent advances in natural language processing have led to dialog models that can track relatively free-form conversation among multiple participants, and extract meaningful semantics about people's intentions and actions. These multi-party dialog models have been used in meeting environments and other domains. In this project, the PI and his team will fuse these two lines of research to achieve a perceptually situated, natural conversation model that robots can use to interact multimodally with people. They will develop a reasonably generic dialog model that allows a situated agent to track the dialog around it, know when it is being addressed, and take direction from a human operator regarding where it should find or place various objects, what it should look for in the environment, and which individuals it should attend to, follow, or obey. Project outcomes will extend existing dialog management techniques to a more general theory of interaction management, and will also extend current state-of-the-art vision research to be able to recognize the subtleties of nonverbal conversational cues, as well as methods for integrating those cues with ongoing dialog interpretation and interaction with the world.
Broader Impacts: There are clearly many positive societal impacts that will derive from this research. Ultimately, development of effective human-robot interfaces will allow greater deployment of robots to perform dangerous tasks that humans would otherwise have to perform, and will also enable greater use of robots for service tasks in domestic environments. As part of the project, the PI will conduct outreach efforts to engage secondary-school students in the hope that exposure to HRI research may increase their interest in science and engineering studies.