Effective communication of emotion relies on a complex interplay of complementary non-verbal channels involving facial and vocal changes. In addition to emotion expression, imitation of emotion also plays a major role in understanding emotion by replicating it Effective use of these face-voice channels and identification of the changes therein is impaired in many patients suffering from neuropsychiatric disorders Quantification of such deficits will help advance basic and clinical research, eventually leading to improved diagnostic accuracy and assessment of treatment effects. Current methodology relies most heavily on clinical ratings that may be subjective and visual rater dependent. The limited automated methods available for facial expression and voice analysis, are able to recognize the emotion, but have been fairly unsuccessful in quantifying the degree of emotion. This has created the need for objective automated methods of emotion evaluation that can quantify emotion, supplement clinical ratings and aid in diagnosis decisions. This project seeks to address these issues by developing and validating advanced automated computerized tools that can objectively and reliably quantify multimodal affect processing. This comprehensive quantified assessment of emotion expression and imitation using single or combined audio-visual modalities of facial expression and voice, will determine the impact of each channel on emotion understanding and on identification of affect related differences between patient-control groups, thereby complementing and augmenting current clinical symptom rating scales. The measures we produce will be easy to employ and could facilitate large-scale studies measuring impairment in affect and affect change across disorders that lead to impaired affect.
In Aim 1 we will develop and validate classifier-based methods for facial affect analysis based on automated temporal action unit profiles, for quantifying facial emotion expression and imitation in the presence of speech.
In Aim 2, we develop and validate emotion classifiers based on the spectral and prosodic features extracted from the acoustic signal. These will quantify emotion in expressed and imitated voice. Finally in Aim 3, we will create a video-based automated emotion expression quantification system that fuses facial and voice features identified in Aims 1 and 2. The population-specific set of face-voice classifiers designed will best elucidate patient-control differences in expression and imitation. Results will be compared to to clinical ratings. We expect that on successful completion of the project we will have an integrated collection of objective facial and speech expression analysis tools, usable by neuropsychiatrists to quantify the degree of emotion impairment and study treatment effects. We expect our methods to influence procedures used for diagnosing schizophrenia and perhaps affective disorders and autism spectrum disorders. The methods will be generic and could be further extended to other neuropsychiatric or neurological conditions that cause deficits in emotional expressiveness.

Public Health Relevance

The project seeks to quantify emotion in facial expression and voice by developing advanced computational tools that will objectively alleviate the challenges faced by subjective methods of emotion evaluation used by neuropsychiatrists to study disease induced emotion production impairments. These well validated tools will be applied to video datasets of patients with schizophrenia and controls to determine group differences and study disease progression and treatment effects.

National Institute of Health (NIH)
National Institute of Mental Health (NIMH)
Research Project (R01)
Project #
Application #
Study Section
Neural Basis of Psychopathology, Addictions and Sleep Disorders Study Section (NPAS)
Program Officer
Freund, Michelle
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Schools of Medicine
United States
Zip Code
Cao, Houwei; Verma, Ragini; Nenkova, Ani (2015) Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech(?) Comput Speech Lang 28:186-202
Cao, Houwei; Savran, Arman; Verma, Ragini et al. (2015) Acoustic and Lexical Representations for Affect Prediction in Spontaneous Conversations. Comput Speech Lang 29:203-217
Savran, Arman; Cao, Houwei; Nenkova, Ani et al. (2015) Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities. IEEE Trans Cybern 45:1927-41
Cao, Houwei; Cooper, David G; Keutmann, Michael K et al. (2014) CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset. IEEE Trans Affect Comput 5:377-390
Shah, Miraj; Cooper, David G; Cao, Houwei et al. (2013) Action Unit Models of Facial Expression of Emotion in the Presence of Speech. Int Conf Affect Comput Intell Interact Workshops 2013:49-54
Savran, Arman; Cao, Houwei; Shah, Miraj et al. (2012) Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering. Proc ACM Int Conf Multimodal Interact 2012:485-492
Hamm, Jihun; Kohler, Christian G; Gur, Ruben C et al. (2011) Automated Facial Action Coding System for dynamic analysis of facial expressions in neuropsychiatric disorders. J Neurosci Methods 200:237-56
Bitouk, Dmitri; Verma, Ragini; Nenkova, Ani (2010) Class-Level Spectral Features for Emotion Recognition. Speech Commun 52:613-625
Hamm, Jihun; Ye, Dong Hye; Verma, Ragini et al. (2010) GRAM: A framework for geodesic registration on anatomical manifolds. Med Image Anal 14:633-42
Wang, Peng; Verma, Ragini (2008) On classifying disease-induced patterns in the brain using diffusion tensor images. Med Image Comput Comput Assist Interv 11:908-16

Showing the most recent 10 out of 12 publications