Speech is a uniquely human act that involves the conversion of discrete cognitive, linguistic representations to continuous, complex movements of speech articulators. By helping to understand better the variability and constraints of this conversion mechanism in healthy speech production, this project contributes to assessing speech disorders associated with neurological disease, such as apraxia. Moreover, it paves the way for novel science-driven paradigms to automatic synthesis of speech from text; a technology of increasing importance for social inclusion. It also provides a unique interdisciplinary training opportunity for students, integrating exposure to various facets of speech science research. Novel dynamic imaging data of speech production, analysis results, and derived models are shared with the broader research community.

The focus of this project is on the coordinative patterns governing the movement of speech articulators toward the achievement of speech production goals (articulatory strategies). The aim to develop a toolset for characterizing this variability, that will also enable mapping phonological representations to speaker-specific dynamics of the vocal tract. The project builds upon the frameworks of Articulatory Phonology and Task Dynamics that provide a model for generating vocal-tract dynamics from linguistic structures, in which the formation of linguistically relevant constrictions in the vocal tract is governed by the temporal deployment of dynamical systems: critically damped oscillators that are characterized temporally by parameters including targets (end goals of articulatory movements) and natural frequencies (time-course of the movement trajectories). This model is updated using vocal-tract real-time magnetic resonance imaging data from 32 speakers, in order to characterize their speech production behavior at three levels: (i) the vocal-tract deformations put forth in the act of speaking; (ii) the relative contributions of those deformations towards the achievement of phonological goals; and (iii) the temporal coordination of articulatory gestures for the production of well-formed utterances. Such characterizations of individual, speaker-specific articulatory strategies, as directly observed using state-of-the-art vocal-tract imaging technology, significantly contributes to our understanding of what underlies phonological constancy and what drives individual variability via phonetic context, speaker anatomy, and speaking style.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1908865
Program Officer
D. Langendoen
Project Start
Project End
Budget Start
2019-08-01
Budget End
2022-07-31
Support Year
Fiscal Year
2019
Total Cost
$474,013
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089