This research will create a new generation of computational tools, called contextual prediction models, for analyzing and modeling social nonverbal communication in human-centered computing. This computational study of nonverbal communication not only encompass the recent advances in machine learning, pattern analysis and computer vision, but goes further by developing and evaluating new algorithms and probabilistic models specifically designed for the domain of social and nonverbal communication. The ability to collect, analyze and ultimately predict human nonverbal cues will provide new insights into human social processes and new human-centric applications that can understand and respond to this natural human communicative channel.

This new endeavor will advance through the development of prediction models and their accompanying selection algorithms and feature representations for predicting human nonverbal behavior given a social context (such as the immediately preceding verbal and nonverbal behaviors of a conversational partner). The investigator's previous work has demonstrated the feasibility of using machine learning approaches to model nonverbal communication. Probabilistic sequential models were shown to improve performance of nonverbal behavior recognition during human-robot interactions and make possible the natural animation of virtual humans. This project addresses three fundamental challenges directly: feature representation (optimal mathematical representation of social context), feature selection (subset of social context relevant to prediction of nonverbal behaviors) and probabilistic modeling (efficiently learning the predictive relationship between social context and nonverbal behaviors). This research will evaluate and test the generalization of the computation tools using a large corpus of natural interactions in different settings (human-human, human-robot and human-computer) and domains (e.g., storytelling, interview, and meetings).

These prediction models will have broad applicability, including the improvement of nonverbal behavior recognition, the synthesis of natural animations for robots and virtual humans, the training of cultural-specific nonverbal behaviors, and the diagnoses of social disorders (e.g., autism spectrum disorder). The code resulting from this work will be made available to the research community through an open-source Matlab toolbox. The outcome of this research effort will produce state-of-the-art computational models more accessible to researchers who aim to analyze social nonverbal communication and develop natural and productive human-centered computing technologies.

Project Report

The overarching goal of this project was to advance the science around computational models of social nonverbal communication by developing prediction models of human nonverbal behavior during social interactions. The ability to collect, analyze and ultimately predict human nonverbal cues will provide new insights into human social processes and new human-centric applications that can understand and respond to this natural human communicative channel. This project addressed three fundamental challenges: (1) Context Representation: mathematical representation of the contextual information during social interactions, (2) Audio-Visual Feature Analysis: automatic selection of the most relevant contextual features to a specific nonverbal behavior, and (3) Joint Computational Model: a probabilistic sequential model which efficiently learns the predictive relationship between context and nonverbal behavior. First, this project pushed forward our understanding of how to represent contextual features by taking into consideration the differences individuals. Not all speakers interact the same way. Some will be extremely expressive and others will only limit set of behaviors to trigger listener feedback. We proposed a new speaker-adaptive context representation which automatically matches the current speaker with the most similar speakers from our database. Our experiments on a challenging storytelling dataset show that our speaker adaptive approach outperforms the conventional non-adaptive approach, with significant improvement. Second, we proposed a generative approach called Co-HMM to learn the multimodal features most predictive for continuous emotion recognition. This approach takes advantage of the complementarity in multimodal dataset and learns separate Hidden Markov Models before concatenating them in one integrated Co-HMM model. This simple but efficient approach to model and analyze audio-visual data was successfully applied to the problem of continuous emotion recognition during the 2nd Audio-Visual Emotion Challenge (AVEC 2012) where our paper won the 2nd place for the Word-level emotion recognition. Third, we proposed a new computational model to learn the joint influence between speaker and listener during dyadic interaction. The new model called Mutual-LMDE learns separate predictive experts for speaker and listener visual behaviors and integrates them using a latent sequential model which identifies commonality and synchrony between listener and speaker behaviors. Our new Mutual-LMDE model outperforms the conventional approach which ignores this mutual information. The research performed as part of this grant was disseminated through 18 peer-reviewed publications, including two journal articles and publications in top conferences such as the Annual Meeting of the Association for Computational Linguistics (ACL) and International Conference on Autonomous Agents and Multi-agent Systems (AAMAS), where it won the best paper award in the virtual human track.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0917321
Program Officer
William Bainbridge
Project Start
Project End
Budget Start
2009-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2009
Total Cost
$495,920
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089