Dynamics of Vocal Tract Shaping

Narayanan, Shrikanth

Abstract

The long-term goal of this project is to wed state-of-the-art technology for imaging the vocal tract with a linguistically informed analysis of dynamic vocal tract constriction actions in order to understand the control and production of the compositional units of spoken language. We have pioneered the use of real time MRI for speech imaging to illuminate articulatory dynamics and to understand how these emerge lawfully from the combined effects of vocal tract constriction events distributed over space (subparts of the tract) and over time. This project has developed and refined a novel real time MRI acquisition ability, making possible current reconstruction rates of up to 96 frames per second, quadrupling current imaging speeds. Data show clear real- time movements of the lips, tongue, velum and epiglottis, providing exquisite information about the spatiotemporal properties of speech gestures in both the oral and pharyngeal portions of the vocal tract. The project has also developed novel noise-mitigated image-synchronized strategies to record speech in-situ during imaging, as well as image processing strategies for deriving linguistically meaningful measures from the data, demon- strating the utility of this approach for linguistic studies of speech communication in a variety of languages. Using our direct access to dynamic information on vocal tract shaping, we investigate vocal tract shaping in three-dimensions as the composition of spatiotemporally coordinated vocal tract action units. This project's specific aims go beyond the dynamic shaping of individual vowels and consonants-postures over time-to examine more complex structuring of articulation-namely, the local and global influences governing linguistic control, temporal coherence and multi-unit coordination. The advances in our technical approach enable a series of studies that leverage: (i) unprecedented high-speech imaging with dynamic rtMRI to consider the prosodic modulation of temporally rapid and temporally coherent speech units; (ii) innovative multi-plane 3D imaging capability to inform the computational identification of linguistic control regimes; and (iii) a large- scale rtMRI corpus ad concomitant machine learning advances to move toward a principled account of system-level co-variability in space and time, both within and among individuals. This symbiotic theory-driven and data-driven research strategy will yield significant innovations in understanding spoken communication. It is no exaggeration to say that the advent of real-time MRI for speech has initiated a dramatic scientific change in the nature of speech production research by allowing for models of production driven by rich quantitative articulatory data. The project is having broad impact through the free dissemination of the unique rtMRI data corpora, tools and models-already used worldwide for research and teaching-and societal out- reach through its website and lay media coverage. Understanding articulatory compositional structure and cross-linguistic potentialities also has critical translational significance impacting the assessment and remediation of speech disorders, as our collaborative work on glossectomy and apraxia has begun to demonstrate.

Public Health Relevance

Real-time imaging of the moving vocal tract with MRI has made direct movies of speech production possible, allowing an investigation of the articulatory composition of speech in healthy adults and illuminating the articulatory dissolution and lack of coherence often found in spoken language disorders. This technology platform, coupled with a linguistically driven theoretical framework that understands speech as composed of articulatory units, provides a scientific foothold for evidence-driven assessment and remediation of speech breakdown in clinical populations, including articulatory remediation and training and deploying assistive technologies for the impaired (automatic speech recognition, machine speech synthesis), and has potential broad impact on the clinical needs of those with swallowing disorders, sleep apnea, or facing recovery of speech function after stroke or surgery. Further, because speech presents the only example of rapid, cognitively- controlled, internal movements of the body, the unique challenges of speech production imaging offer the wider biomedical imaging community traction for advances that improve temporal and spatial image resolution-advances with potential import for cardiac and other imaging.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute on Deafness and Other Communication Disorders (NIDCD)
Type: Research Project (R01)
Project #: 5R01DC007124-14
Application #: 9829092
Study Section: Language and Communication Study Section (LCOM)
Program Officer: Shekim, Lana O

Project Start: 2005-05-01
Project End: 2020-11-30
Budget Start: 2019-12-01
Budget End: 2020-11-30
Support Year: 14
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: University of Southern California
Department: Engineering (All Types)
Type: Biomed Engr/Col Engr/Engr Sta
DUNS #: 072933393

City: Los Angeles
State: CA
Country: United States
Zip Code: 90089

Related projects

Publications

Lim, Yongwan; Zhu, Yinghua; Lingala, Sajan Goud et al. (2018) 3D dynamic MRI of the vocal tract during natural speech. Magn Reson Med :

Lammert, Adam C; Shadle, Christine H; Narayanan, Shrikanth S et al. (2018) Speed-accuracy tradeoffs in human speech production. PLoS One 13:e0202180

Vaz, Colin; Ramanarayanan, Vikram; Narayanan, Shrikanth (2018) Acoustic Denoising using Dictionary Learning with Spectral and Temporal Regularization. IEEE/ACM Trans Audio Speech Lang Process 26:967-980

Parrell, Benjamin; Narayanan, Shrikanth (2018) Explaining Coronal Reduction: Prosodic Structure and Articulatory Posture. Phonetica 75:151-181

Gupta, Rahul; Audhkhasi, Kartik; Jacokes, Zach et al. (2018) Modeling multiple time series annotations as noisy distortions of the ground truth: An Expectation-Maximization approach. IEEE Trans Affect Comput 9:76-89

Lingala, Sajan Goud; Zhu, Yinghua; Lim, Yongwan et al. (2017) Feasibility of through-time spiral generalized autocalibrating partial parallel acquisition for low latency accelerated real-time MRI of speech. Magn Reson Med 78:2275-2282

Hagedorn, Christina; Proctor, Michael; Goldstein, Louis et al. (2017) Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging. J Speech Lang Hear Res 60:877-891

Töger, Johannes; Sorensen, Tanner; Somandepalli, Krishna et al. (2017) Test-retest repeatability of human speech biomarkers from static and real-time dynamic magnetic resonance imaging. J Acoust Soc Am 141:3323

Lingala, Sajan Goud; Zhu, Yinghua; Kim, Yoon-Chul et al. (2017) A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magn Reson Med 77:112-125

Lingala, Sajan Goud; Sutton, Brad P; Miquel, Marc E et al. (2016) Recommendations for real-time speech MRI. J Magn Reson Imaging 43:28-44

Showing the most recent 10 out of 45 publications

Comments

Be the first to comment on Shrikanth Narayanan's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: