In situated dialogue, although artificial agents and their human partners are copresent in a shared environment, their knowledge and representation of the shared world are significantly different. When a shared basis of the environment is missing, communication between partners become more challenging. Language alone can be difficult and inefficient for partners to ground objects of interest. Motivated by previous empirical findings on eye gaze in joint attention, in collaboration, and in human language processing, our hypothesis is that eye gaze plays an important role in coordinating the collaborative referring process, especially between partners who have mismatched representations of their shared environment. Based on this hypothesis, the objective of this exploratory project is to examine the role of shared gaze in the collaborative referring process.

This EArly Grant for Exploratory Research aims to generate new findings on how shared gaze coordinates the collaborative referring behaviors between partners with mismatched representation of the shared environment. These findings will provide insight to computational approaches and systems that combine gaze modeling with the collaborative discourse to ground references. The collected data will support many in-depth studies on language processing in situated dialogue.

Project Report

In situated interaction, humans and agents often have mismatched capabilities in perceiving the shared environment. When the shared perceptual basis is missing, communication between humans and agents becomes difficult. In such situation, language alone will be insufficient in grounding references to the environment. Other non-verbal modalities will play an important role. To address this issue, this project investigated the role of shared gaze in the collaborative process of referring in situated dialogue, especially between partners with mismatched perceptual capabilities. We first developed a system that relies on actual processing errors from computer vision algorithms to simulate lowered visual perceptual capability of a human. On one computer screen, a director (who is assumed to have normal human perception) can see the true image of the original scene. On the other computer screen, a matcher (who is assumed to have lowered perceptual capability) is only able to see an impoverished image (processed by computer vision algorithms) of the same original scene. Given this setup, the director and the matcher collaborate with each other to complete an object naming task: the director needs to communicate with the matcher the secret names of some objects so that the matcher can correctly identify which object has which name. Depending on the experimental conditions, either the director’s eye gaze or the matcher’s eye gaze is made available to his partner during interaction. We designed a set of experiments to investigate the effects of two factors (mismatched capabilities and shared gaze) on the task performance as well as the interactions between the two factors. A 2 by 2 factorial experiment design was applied. Our experimental results have shown that when the shared perceptual basis is missing, the average time of accomplishing the naming task is significantly longer. This implies it is more difficult for partners with mismatched capabilities to reach a common ground. The effect of the director’s gaze is significant. When the director’s gaze was shared to the matcher during interaction, their collaboration became significantly more efficient. The interaction effect between the two factors is also significant. The shared gaze is more helpful under mismatched views compared to matched views. The effect of the matcher’s gaze is marginally significant. One observation is that the matchers had a tendency of exploring the interface. Therefore their eye gaze may not be directly linked to the task at hand. These findings indicate that, tracking human eye gaze can be particularly beneficial in mediating shared perceptual basis in situated interaction. It provides important cues for the agent to recognize its own limitation in perception (e.g., recognition errors and segmentation errors) and facilitate effective collaboration between humans and agents. These findings have important implications in developing artificial agents that can interact with humans in the real world.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Standard Grant (Standard)
Application #
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Michigan State University
East Lansing
United States
Zip Code