In spoken dialog systems, interpreting user speech input is still a significant challenge due to limited speech recognition and language understanding performance. This problem is further amplified if a user has an accent or is speaking in a noisy environment. However, previous research has shown that, in multimodal systems, fusing two or more information sources can be an effective means of reducing recognition uncertainties, for example through mutual disambiguation. Inspired by earlier work on multimodal systems, in this project the PI will investigate the role of eye gaze in human machine conversation, in particular in salience modeling for robust spoken language understanding. Cognitive studies have shown that human eye gaze is one of the reliable indicators of what a person is "thinking about." Specifically, eye gaze is tightly linked to human language processing. Previous psycholinguistic work has shown that almost immediately after hearing a word, the eyes move to the corresponding real-world referent. And right before speaking a word, the eyes also move to the mentioned object. Not only is eye gaze highly reliable, it is also an implicit, subconscious reflex of speech. The user does not need to make a conscious decision; the eye automatically moves towards the relevant object, without the user even being aware. Motivated by these psycholinguistic findings, the PI's hypothesis is that during human machine conversation user eye gaze information coupled with conversation context can signal a part of the physical world (related to the domain and the graphical interface) that is most salient at each point of communication, thus it can potentially be used to tailor the interpretation of speech input. Based on this hypothesis, the PI will seek to improve spoken language understanding in conversational interfaces through a new salience-based framework with two objectives: (1) To better understand the role of eye gaze in human language production and its implications in salience modeling for automated input interpretation; and (2) To develop algorithms and systems that apply computational gaze based salience modeling to robust spoken language understanding. These objectives will be pursued in the following four directions: (a) Investigation of the utility of human eye gaze and its implications for salience modeling during human machine conversation through psycholinguistic studies; (b) Development of computational salience models that integrate eye gaze with conversation context to automatically identify a salient part of the physical world at each point of communication; (c) Development of approaches that apply the new salience models to constrain the hypothesis space for robust spoken language understanding; and (d) Evaluation of the generality of the new approaches in two different applications: an interior design/training application based on a 3D rendered interface, and an information seeking application using a 2D map-based interface.

Broader Impacts: The technologies to be developed in this interdisciplinary project can be applied to many applications such as virtual training systems where users can see the interface and talk to the computer system at the same time. The technologies will benefit a variety of diverse users, and particularly individuals who are unable to interact with graphical interfaces with their hands (e.g., motion disabled users). Since one major application area of the work is e-training and e-learning, the education and outreach impact of the proposed research is potentially profound; the PI will make specific efforts to transfer the research results into classrooms. The project will also provide a unique opportunity for students in Computer Science, Psychology, and Cognitive Science to work together, and thus will synergize multidisciplinary research activities at Michigan State University.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0535112
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
2005-11-15
Budget End
2009-10-31
Support Year
Fiscal Year
2005
Total Cost
$312,000
Indirect Cost
Name
Michigan State University
Department
Type
DUNS #
City
East Lansing
State
MI
Country
United States
Zip Code
48824