This project explores perception-driven models of human audio-visual emotion using statistical analyses, data-driven computational modeling, and implicit sensing. Emotion underlies and modulates human communication. It is used in the diagnosis of many mental health conditions and is tracked in therapeutic interventions. Research in emotion perception seeks to identify models that describe the felt sense of 'typical' emotion expression -- i.e., an observer/evaluator's attribution of the emotional state of the speaker. This felt sense is a function of the methods through which individuals integrate the presented multi-modal emotional information. However, the nature of the interaction of the multi-modal cues is still an open question. This project will investigate multi-modal cue integration by studying how emotional inconsistency affects human perceptual judgment. In pursuit of this goal, the research objectives of this proposal are (1) to identify and validate primary and secondary audio-visual cues responsible for emotion perception, (2) to create a data-driven model to automatically predict the emotion perception of an evaluator, and (3) to predict evaluator state using implicit physiological and body gesture cues.

The first research objective addresses the open question of how distal cues, the encoding of a speaker's communicative goals, interact and result in the felt sense of specific emotion states. Novel techniques will be used to identify emotionally salient distal cues using emotionally consistent and inconsistent audio-visual information. This identification has implications in the design of emotion classification algorithms and the emotional behavior of affective agents. The second research thrust addresses the open question of how human-centered models (rather than data-driven models) can be designed for use in emotion classification tasks. The project will investigate the efficacy of novel dynamic structures to model emotionally inconsistent information. These new structures will provide insights into the development of human-centered emotion classification inspired by the emotion perception process, rather than solely on data fluctuations. The third research objective addresses the open question regarding the effect of audio-visual emotion evaluation tasks on the evaluator's internal state. We will assess evaluator inattention in the context of emotional evaluation tasks. Models that can accurately predict evaluator inattention have applications in long-term human-computer and human-robot interaction platforms. The insights gained from this project will facilitate the design of emotion-focused algorithms that replicate the process by which humans interpret and integrate emotional audiovisual signals. It will also aid in the creation of emotional interfaces for health informatics applications, which will lead to more specifically targeted interventions and treatments for many mental health conditions including schizophrenia, depression, and autism.

Project Report

Accomplishments: The project investigated how individuals integrate audio-visual information during emotion perception. This area of research is challenging because audio and video information are correlated, rendering it difficult to assess how each source of information individually affects perception. However, this knowledge is critical. It impacts how we think about automatically estimating emotion, how we think about the design of affective interfaces, and even how we understand fundamental characteristics of human perception. We addressed this challenge by investigating new emotional McGurk effect stimuli. These stimuli are composed of emotionally inconsistent information (e.g., an angry face and a happy voice) and allowed us to determine how change in one domain (e.g., audio or video) results in changes in audio-visual perception. The project has resulted in 11 publications, including one journal paper and one Best Student Paper at ACM Multimedia, 2014. The findings are being extended to understand how emotion perception is affected by mood. Intellectual Merit: Our goals included exploring emotional McGurk Effect perception for a healthy population, emotion perception across song and speech, improvements in emotion classification (including a best student paper at ACM Multimedia 2014), to the design of assistive technology, and diagnostics. The results have lead to a new understanding of the audio-visual cues that impact perception and how these cues change based on the emotion expressed. The findings are leading to new designs for emotion classification systems. Broader Impact: The project has supported a female PhD student, a female MS student (graduated), and four undergraduate students (three male computer science students, one supported by SURE and two supported by REUs, and three liberal arts students (two supported through UROP, two female). PI Mower Provost developed the course "Applications of Machine Learning in Human-Centered Computing" and extended the material for an undergraduate audience ("Human-Centered Computing" and "Intelligent Interactive Systems"). She also presented the work both in academic seminars and as a part of ongoing outreach efforts (including Girls Encoded and Tech Day). The data from this project are evaluated and will be released.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1217183
Program Officer
Kenneth Whang
Project Start
Project End
Budget Start
2012-09-01
Budget End
2014-11-30
Support Year
Fiscal Year
2012
Total Cost
$256,622
Indirect Cost
Name
Regents of the University of Michigan - Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109