Humans and robots alike have a critical need to navigate through new environments to carry out everyday tasks. A parent and child may be touring a college campus; a robot may be searching for survivors after a building has collapsed. In this collaboration by faculty at two institutions, the PIs envision human and robotic partners sharing common perceptual-linguistic experiences and cooperating in mundane tasks like janitorial work and home care as well as in critical tasks like emergency response or search-and-rescue. But while mapping and navigation are now commonplace for mobile robots, when considering human-robot collaboration for even simple tasks one is confronted by a critical barrier: robots and people do not share a common language. Human language is rich in linguistic elements for describing our spatial environment, the objects and places within it, and navigable paths through it (e.g., "go down the hallway and enter the third door on the right."). Robots, on the other hand, inhabit a metric world of occupied and unoccupied discretized grid cells, wherein most objects are devoid of meaning (semantics). The PIs' goal in this project is to overcome this limitation by conjoining the well understood problem of simultaneous localization and mapping (SLAM) with that of language acquisition, in order to enable robots to learn to communicate with people in English about navigation tasks. The PIs will spur interest in this novel research area within the scientific community by means of an Amazing Race challenge problem modeled after the reality television show of the same name, which will place robots and human-robot teams in unknown environments and charge them with completing a specific task as quickly as possible. Other outreach activities will include visits to K-12 schools with demonstrations.

This work will focus on simultaneous localization, mapping, and language acquisition, a field of inquiry that remains untouched. The crucial principles are that semantics are formulated as a cost function, which in turn specifies a joint distribution over many variables including those capturing sensory input, language, the environment map, and robot motor control. The cost function and joint distribution support standard inference of many forms, such as command following. More importantly, they support multidirectional inference over multiple variable sets jointly, such as simultaneous mapping and language interpretation. Within this innovative multivariate optimization-based framework, the PIs plan a thorough experimental regimen including both synthetic and real-world datasets of challenging environments, grounding the semantics of natural language in spatial maps of the realistic visual world and robot motor control, while navigating along particular paths or to arrive at particular destinations in (possibly novel) environments that are mapped not only in a geometric sense but also with linguistic underpinning to these particular paths and destinations. The language approach is compositional and uses spatially-grounded representations of nouns (objects/places) and prepositions (relations between them). These spatially-grounded representations will be modeled in the context of mapping. Furthermore, the PIs will consider realistic environments and adapt visual models thereof according to the joint model. The PIs are aware of no other work that jointly models mapping, vision, and language acquisition.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1522904
Program Officer
Ephraim Glinert
Project Start
Project End
Budget Start
2015-09-01
Budget End
2020-08-31
Support Year
Fiscal Year
2015
Total Cost
$649,999
Indirect Cost
Name
Regents of the University of Michigan - Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109