This project will create a computational theory of visual common ground, allowing users to give directives to a robot (or other team members) and receive confirmation or constraints through visual communication over a shared visual display. The motivating example is an urban search and rescue (US&R) professional tapping, sketching, and annotating on an iPad in order to direct a small unmanned aerial system (sUAS) without training. Previous work in human-robot interaction with common ground has been limited to natural language, but recent work has shown that having all team members see the robot's eye view in unmanned ground robots significantly improved performance and situation awareness. The proposed work populate the computational theory using the Shared Roles Model to represent the inputs (directives, notations), outputs (display viewpoint, form, size, location, content, etc.), and transformations (visual communication engine). The computational theory will be prototyped, refined, and tested by US&R practitioners flying realistic sUAS missions at Texas A&M's Disaster City.
Intellectual merit: The project will create a computational theory of visual common ground that will enable two-way human-robot interaction using visual communication mechanisms such as tapping, sketching, and annotation on shared visual displays on mobile devices such as iPads, smartphones, and tablet PCs. The results will advance the fields of human-robot interaction, artificial intelligence, and cognitive science.
Broader impacts: The results could revolutionize how people use mobile devices to interact with robots (and with each other) using naturalistic visual mechanisms, bypassing extensive training. The project will actively recruit women, Hispanics, and persons with disabilities to participate through REU programs. An open source visual communication toolkit for HRI researchers will be produced. The results will improve robots for public safety, remote medicine, and telecommuting, and could also immediately help save lives through incorporation into Texas Task Force 1.
This project investigated how a visual common ground can improve human-robot interaction for remote presence applications, such as public safety, medical care, and telecommuting. Remote presence applications are where one or more humans use a robot to project themselves into an environment in order to complete a time-critical mission. One example is the use of small unmanned aerial systems (sUAS) for fire rescue, law enforcement, border patrols, and inspection of critical infrastructure. It is difficult to imagine the elimination of the human from these applications because of the need for human judgment, but at the same time, robots will become increasingly capable and autonomous. This project constructed three different interfaces and tested them with over 30 professional fire rescue professionals through monthly flights, five hazardous materials exercises, and one deployment to the SR530 Mudslides. The first two relied on visual common ground for the Mission Specialist and Pilot to see the same things, but both went about their tasks independently. The Dedicated interface allowed the expert (called a Mission Specialist) to see what the robot was seeing but without the artifacts that overlaid on the Pilot’s displays. The Dedicated Active Mission Specialist interface} allowed the Mission Specialist to actively control the camera but did not not provide a mechanism for interacting with the Pilot other than verbal directives. The Shared Active interfaces were a major departure, changing both the Mission Specialists and Pilots display. They are are a set of interfaces that allow the Mission Specialist and Pilot to communicate with each other by sketching and spotlighting on the shared display of the robot's camera video. The Shared Active interfaces were more popular with responders, could be used by multiple responders at the same time, and reduced subtle safety risks. The project also refined the Shared Roles Model and used it to predict preconditions for unsafe acts and design (or modify) the interfaces to eliminate or mitigate the risk. The model assumes that the safe operation of an unmanned system is a function of the robot, the roles that the agents are expected to perform, and the interfaces and team coordination mechanisms; this is very different from existing cognitive architectures which try to determine the potential for human error independently of the specific hardware and interface. To date, five different categories of preconditions for unsafe acts stemming from role sharing were identified and four were predicted and observed in testing with the SUAS. While this project explored the fundamental science of visual common ground, the findings had immediate value to society and the economy. The dedicated interface was used for disaster response during the Center for Robot-Assisted Search and Rescue’s deployment to the SR530 Washington State Mudslides. The dedicated interface was set up as a back up should a responder join the team, even though the flights were taskable agent style and no Mission Specialist was needed.. The interface clearly fulfilled its purpose of keeping responders from crowding or jostling the operator. Interfaces that allow experts to use SUAS without traveling to a site can revolutionize the economy. The shared active interface has attracted great interest because it can allow telework, potentially increasing the efficiency of inspecting critical infrastructure and responding to accidents and disasters. At the SR530 mudslides, once the CRASAR team had arrived on-site (1 day of travel), scouted landing zones and where to place observers to maintain line of sight (1 day), it still took 2 hours of travel time to reach the landing zone and begin flying. The entire flight time took 48 minutes but had to be conducted over a 3 hour period that included stopping for rain. Ignoring the two days of travel to and from the home office and one day of scouting, and considering only the day in the field, with a visual common ground interfaces, a geologist or hydrologist would be needed only 11% of the field time if they were already in the area or 0.2% of the time if the time needed to travel to the area was included.