Face-to-face communication is a highly dynamic process where participants mutually exchange and interpret linguistic and gestural signals. Even when only one person speaks at the time, other participants exchange information continuously amongst themselves and with the speaker through gesture, gaze, posture and facial expressions. To correctly interpret the high-level communicative signal, an observer needs to jointly integrate all spoken words, subtle prosodic changes and simultaneous gestures from all participants.

The proposed effort endeavors to create a new generation of computational models for modeling the interdependence between linguistic symbols and nonverbal signals during social interactions. This computational framework has wide applicability, including the recognition of human social behaviors, the synthesis of natural animations for robots and virtual humans, improved multimedia content analysis, and the diagnosis of social and behavioral disorders (e.g., autism spectrum disorder). This research effort is an important milestone, complementary to recent research efforts focusing on only two components (e.g., social signal processing, which focuses on nonverbal and social signals). The proposed unified approach to Social-Symbols-Signals will pave the way for new robust and efficient computational perception algorithms able to recognize high-level communicative behaviors (e.g., intent and sentiments) and will enable new computational tools for researchers in behavioral sciences.

The proposed research will advance this endeavor through the development of new probabilistic models for jointly capturing the interdependence between language, gestures and social signals, and novel computational representations, which integrates data-driven processing and logic rule-based approach (so that prior knowledge from social sciences can be easily included). Four fundamental research goals will be directly addressed: symbol-signal representation (joint representation of language and nonverbal), modeling social interdependence (joint modeling of communicative signals between multiple participants), variability in signal interpretations (variability with annotations of high-level communicative signals), and generalization and validation (generalization over different communicative signals and domains).

The proposed research will enable more natural interaction between users and embodied conversational dialogue systems, impacting the way in which computers are used, for example, in tutoring and in cultural and language training. The potential uses of such software and data go far beyond the scope of this project, making it possible, for example, to perform large scale corpus-based studies about social aspects of human face-to-face (multimodal) communication, or cognitive aspects of human multimodal processing. Following the investigators' past experience with sharing research software open-source, code and corpus annotations will be made available to the research community. These shared research results will be valuable for new researchers as well as important educational material for course development.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Standard Grant (Standard)
Application #
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Southern California
Los Angeles
United States
Zip Code