There has been increasing interest in affective dialogue systems, motivated by the belief that in human-human dialogues, participants seem to be (at least to some degree) detecting and responding to the emotions, attitudes and metacognitive states of other participants. The goal of the proposed research is to improve the state of the art in affective spoken dialogue systems along three dimensions, by drawing on the results of prior research in the wider spoken dialogue and affective system communities. First, prior research has shown that not all users interact with a system in the same way; the proposed research hypothesizes that employing different affect adaptations for users with different domain aptitude levels will yield further performance improvement in affective spoken dialogue systems. Second, prior research has shown that users display a range of affective states and attitudes while interacting with a system; the proposed research hypothesizes that adapting to multiple user states will yield further performance improvement in affective spoken dialogue systems. Third, while prior research has shown preliminary performance gains for affect adaptation in semi-automated dialogue systems, similar gains have not yet been realized in fully automated systems. The proposed research will use state of the art empirical methods to build fully automated affect detectors. It is hypothesized that both fully and semi-automated versions of a dialogue systemthat either adapts to affect differently depending on user class, or that adapts to multiple user affective states, can improve performance compared to non-adaptive counterparts, with semi-automation generating the most improvement. The three hypotheses will be investigated in the context of an existing spoken dialogue tutoring system that adapts to the user state of uncertainty. The task domain is conceptual physics typically covered in a first-year physics course (e.g., Newtons Laws, gravity, etc.). To investigate the first hypothesis, a first enhanced system version will be developed; it will use the existing uncertainty adaptation for lower aptitude users with respect to domain knowledge, and a new uncertainty adaptation will be developed and implemented to be employed for higher aptitude users. To investigate the second hypothesis, a second enhanced systemversion will be developed; it will use the existing uncertainty adaptation for all turns displaying uncertainty, and a new disengagement adaptation will be developed and implemented to be employed for all student turns displaying a second state of disengagement. A controlled experiment with the two enhanced systems will then be conducted in a Wizard-of-Oz (WOZ) setup, with a human Wizard detecting affect and performing speech recognition and language understanding. To investigate the third hypothesis, a second controlled experiment will be conducted, which replaces the WOZ system versions with fully-automated systems.

The major intellectual contribution of this research will be to demonstrate whether significant performance gains can be achieved in both partially and fully-automated affective spoken dialogue tutoring systems 1) by adapting to user uncertainty based on user aptitude levels, and 2) by adapting to multiple user states hypothesized to be of primary importance within the tutoring domain, namely uncertainty and disengagement. The research project will thus advance the state of the art in both spoken dialogue and computer tutoring technologies, while at the same time demonstrating any differing effects of affect-adaptive systems under ideal versus realistic conditions. More broadly, the research and resulting technology will lead to more natural and effective spoken dialogue-based systems, both for tutoring as well as for more traditional information-seeking domains. In addition, improving the performance of computer tutors will expand their usefulness and thus have substantial benefits for education and society.

Project Report

This project was designed to improve the state of the art in affective spoken dialogue systems, motivated by the belief that in human-human dialogues, speakers detect and respond to the emotions and attitudes of other speakers. First, prior research has shown that not all users interact with a system in the same way. We thus hypothesized that employing different affect adaptations for users with different domain aptitude levels would improve system performance. Second, prior research has shown that users display a range of affective states and attitudes while interacting with a system. We thus hypothesized that adapting to multiple user states would yield performance improvements compared to adapting to only one user state or not adapting at all. Third, while prior research has shown preliminary performance gains for affect adaptation in semi-automated dialogue systems, similar gains have not yet been realized in fully automated systems. We hypothesized that fully and semi-automated affect-adaptive dialogue systems could be developed using empirical methods, and that such systems would improve performance compared to non-adaptive counterparts, with semi-automation generating the most improvement. The three hypotheses were investigated in the context of an existing spoken dialogue tutoring system that adapted to the user state of uncertainty. The task domain was conceptual physics, which is typically covered in a first-year physics course (e.g., Newtons Laws, gravity, etc.). To investigate the first hypothesis, a new system was developed that used the existing uncertainty adaptation for lower aptitude users and a new uncertainty adaptation for higher aptitude users. To investigate the second hypothesis, another new system was developed that not only responded to uncertainty, but also responded to disengagement. To investigate the third hypothesis, controlled experiments with all enhanced systems were conducted in both semi and fully-automated conditions. The major intellectual outcome of the research was a demonstration that significant performance gains could be achieved by adapting to multiple user states hypothesized to be of primary importance within the tutoring domain, namely uncertainty and disengagement. The investigations regarding user modeling, in contrast, yielded null results. The research project not only advanced the state of the art in both spoken dialogue and computer tutoring technologies, but at the same time demonstrated the degradation of results in affect-adaptive systems under ideal versus realistic conditions. More broadly, the research and resulting technology will lead to more natural and effective spoken dialogue-based systems, both for tutoring as well as for more traditional information-seeking domains. In addition, improving the performance of computer tutors will expand their usefulness and thus have substantial benefits for education and society.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0914615
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$460,745
Indirect Cost
Name
University of Pittsburgh
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213