This project explores perception-driven models of human audio-visual emotion using statistical analyses, data-driven computational modeling, and implicit sensing. Emotion underlies and modulates human communication. It is used in the diagnosis of many mental health conditions and is tracked in therapeutic interventions. Research in emotion perception seeks to identify models that describe the felt sense of 'typical' emotion expression -- i.e., an observer/evaluator's attribution of the emotional state of the speaker. This felt sense is a function of the methods through which individuals integrate the presented multi-modal emotional information. However, the nature of the interaction of the multi-modal cues is still an open question. This project will investigate multi-modal cue integration by studying how emotional inconsistency affects human perceptual judgment. In pursuit of this goal, the research objectives of this proposal are (1) to identify and validate primary and secondary audio-visual cues responsible for emotion perception, (2) to create a data-driven model to automatically predict the emotion perception of an evaluator, and (3) to predict evaluator state using implicit physiological and body gesture cues.

The first research objective addresses the open question of how distal cues, the encoding of a speaker's communicative goals, interact and result in the felt sense of specific emotion states. Novel techniques will be used to identify emotionally salient distal cues using emotionally consistent and inconsistent audio-visual information. This identification has implications in the design of emotion classification algorithms and the emotional behavior of affective agents. The second research thrust addresses the open question of how human-centered models (rather than data-driven models) can be designed for use in emotion classification tasks. The project will investigate the efficacy of novel dynamic structures to model emotionally inconsistent information. These new structures will provide insights into the development of human-centered emotion classification inspired by the emotion perception process, rather than solely on data fluctuations. The third research objective addresses the open question regarding the effect of audio-visual emotion evaluation tasks on the evaluator's internal state. We will assess evaluator inattention in the context of emotional evaluation tasks. Models that can accurately predict evaluator inattention have applications in long-term human-computer and human-robot interaction platforms. The insights gained from this project will facilitate the design of emotion-focused algorithms that replicate the process by which humans interpret and integrate emotional audiovisual signals. It will also aid in the creation of emotional interfaces for health informatics applications, which will lead to more specifically targeted interventions and treatments for many mental health conditions including schizophrenia, depression, and autism.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1217104
Program Officer
Kenneth C. Whang
Project Start
Project End
Budget Start
2012-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2012
Total Cost
$201,573
Indirect Cost
Name
University of Texas at Dallas
Department
Type
DUNS #
City
Richardson
State
TX
Country
United States
Zip Code
75080