This project will advance social skill training by developing and evaluating a multimodal computational framework specifically targeted to improve public speaking performance through repeated training interactions with a virtual audience that perceives the speaker and produces meaningful nonverbal feedback. Interpersonal skills such as public speaking are essential assets for a large variety of professions and in everyday life. The ability to communicate in social environments often greatly influences a person's career development, can help build relationships and resolve conflict. Public speaking is not a skill that is innate to everyone, but can be mastered through extensive training. Nonverbal communication is an important aspect of successful public speaking and interpersonal communication, and at the same time difficult to train. This research effort will create the computational foundations to automatically assess interpersonal skill expertise and help people improve their skills using an interactive simulated virtual human framework.
There are three fundamental research goals: (1) Developing a probabilistic computational model to learn the temporal and multimodal dependencies and infer a speaker's public speaking performance from acoustic and visual nonverbal behavior; (2) Understanding the design challenges of developing a simulated audience that is interactive, believable, and most importantly providing meaningful and training-relevant feedback to the speaker; and (3) Understanding the impact of the virtual audience on speakers' performance and learning outcomes by performing a comparative study investigating alternative feedback and training approaches. This work builds upon the promising results of a pilot research study and upon a prototype virtual human infrastructure allowing for the seamless integration of automatically modeled interpersonal skill expertise for flexible virtual human interaction and gesture control.
Virtual audiences have the great advantage that their appearance and behavioral patterns can be precisely programmed and systematically presented to pace the interaction. The algorithms developed as part of this research to model temporal and multimodal dependencies will have a broad applicability outside the domain of public speaking assessment, including healthcare applications. The interactive virtual human technology may serve as the basis for novel teaching applications in a wide range of areas in the future, due to its extensibility and availability. The programming code and data will be made available to the research community and students.