Encoding of emotions in speech is achieved by vocal modulations that require an intricate control of human voicing and vocal tract articulation. The aim of this research is to identify and model articulatory processes for emotional speech production based on advanced speech production data acquisition technologies including Electromagnetic articulography (EMA) and a real-time Magnetic resonance imaging (rt-MRI). The research focuses on directly measuring and modeling articulatory kinematics and their interplay with prosodic modulation of pitch, loudness and segmental durations in speech emotion expression in order to understand the emotional speech production strategies across emotion types as well as across speakers. The validity of the emotional speech production models is verified by using a software articulatory synthesizer in an analysis-by-synthesis fashion. Theoretical implications of the findings are interpreted in relation to the Hyper and Hypo theory and the Converter/Distributor (C/D) model of speech production.
Detailed knowledge on the effects of emotion on the human speech articulatory and prosodic patterning has transformative potential in developing improved speech processing technologies for emotional speech recognition and synthesis that are critical for the development of natural and robust man-machine interfaces. This goal also critically includes informing quantitative assessment of expressive speech to characterize atypical or distressed vocal behavior in diverse populations, for instance, children with Autism Spectrum Disorder (ASD). Finally, a natural by-product of this research effort is the unique articulatory database that will be shared freely with the community for further expansion of the knowledge of human speech production.