The long-term goal of this project is to wed state-of-the-art technology for imaging the vocal tract with a linguistically informed analysis of dynamic vocal tract constriction actions in order to understand the control and production of the compositional units of spoken language. We have pioneered the use of real time MRI for speech imaging to illuminate articulatory dynamics and to understand how these emerge lawfully from the combined effects of vocal tract constriction events distributed over space (subparts of the tract) and over time. This project has developed and refined a novel real time MRI acquisition ability, making possible current reconstruction rates of up to 96 frames per second, quadrupling current imaging speeds. Data show clear real- time movements of the lips, tongue, velum and epiglottis, providing exquisite information about the spatiotemporal properties of speech gestures in both the oral and pharyngeal portions of the vocal tract. The project has also developed novel noise-mitigated image-synchronized strategies to record speech in-situ during imaging, as well as image processing strategies for deriving linguistically meaningful measures from the data, demon- strating the utility of this approach for linguistic studies of speech communication in a variety of languages. Using our direct access to dynamic information on vocal tract shaping, we investigate vocal tract shaping in three-dimensions as the composition of spatiotemporally coordinated vocal tract action units. This project's specific aims go beyond the dynamic shaping of individual vowels and consonants-postures over time-to examine more complex structuring of articulation-namely, the local and global influences governing linguistic control, temporal coherence and multi-unit coordination. The advances in our technical approach enable a series of studies that leverage: (i) unprecedented high-speech imaging with dynamic rtMRI to consider the prosodic modulation of temporally rapid and temporally coherent speech units; (ii) innovative multi-plane 3D imaging capability to inform the computational identification of linguistic control regimes; and (iii) a large- scale rtMRI corpus ad concomitant machine learning advances to move toward a principled account of system-level co-variability in space and time, both within and among individuals. This symbiotic theory-driven and data-driven research strategy will yield significant innovations in understanding spoken communication. It is no exaggeration to say that the advent of real-time MRI for speech has initiated a dramatic scientific change in the nature of speech production research by allowing for models of production driven by rich quantitative articulatory data. The project is having broad impact through the free dissemination of the unique rtMRI data corpora, tools and models-already used worldwide for research and teaching-and societal out- reach through its website and lay media coverage. Understanding articulatory compositional structure and cross-linguistic potentialities also has critical translational significance impacting the assessment and remediation of speech disorders, as our collaborative work on glossectomy and apraxia has begun to demonstrate.

Public Health Relevance

Real-time imaging of the moving vocal tract with MRI has made direct movies of speech production possible, allowing an investigation of the articulatory composition of speech in healthy adults and illuminating the articulatory dissolution and lack of coherence often found in spoken language disorders. This technology platform, coupled with a linguistically driven theoretical framework that understands speech as composed of articulatory units, provides a scientific foothold for evidence-driven assessment and remediation of speech breakdown in clinical populations, including articulatory remediation and training and deploying assistive technologies for the impaired (automatic speech recognition, machine speech synthesis), and has potential broad impact on the clinical needs of those with swallowing disorders, sleep apnea, or facing recovery of speech function after stroke or surgery. Further, because speech presents the only example of rapid, cognitively- controlled, internal movements of the body, the unique challenges of speech production imaging offer the wider biomedical imaging community traction for advances that improve temporal and spatial image resolution-advances with potential import for cardiac and other imaging.

Agency
National Institute of Health (NIH)
Institute
National Institute on Deafness and Other Communication Disorders (NIDCD)
Type
Research Project (R01)
Project #
5R01DC007124-12
Application #
9390471
Study Section
Language and Communication Study Section (LCOM)
Program Officer
Shekim, Lana O
Project Start
2005-05-01
Project End
2020-11-30
Budget Start
2017-12-01
Budget End
2018-11-30
Support Year
12
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of Southern California
Department
Engineering (All Types)
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90033
Lingala, Sajan Goud; Zhu, Yinghua; Lim, Yongwan et al. (2017) Feasibility of through-time spiral generalized autocalibrating partial parallel acquisition for low latency accelerated real-time MRI of speech. Magn Reson Med 78:2275-2282
Lingala, Sajan Goud; Zhu, Yinghua; Kim, Yoon-Chul et al. (2017) A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magn Reson Med 77:112-125
Hagedorn, Christina; Proctor, Michael; Goldstein, Louis et al. (2017) Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging. J Speech Lang Hear Res 60:877-891
Töger, Johannes; Sorensen, Tanner; Somandepalli, Krishna et al. (2017) Test-retest repeatability of human speech biomarkers from static and real-time dynamic magnetic resonance imaging. J Acoust Soc Am 141:3323
Ramanarayanan, Vikram; Van Segbroeck, Maarten; Narayanan, Shrikanth S (2016) Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories. Comput Speech Lang 36:330-346
Toutios, Asterios; Narayanan, Shrikanth S (2016) Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research. APSIPA Trans Signal Inf Process 5:
Chaspari, Theodora; Tsiartas, Andreas; Tsilifis, Panagiotis et al. (2016) Markov Chain Monte Carlo Inference of Parametric Dictionaries for Sparse Bayesian Approximations. IEEE Trans Signal Process 64:3077-3092
Lingala, Sajan Goud; Sutton, Brad P; Miquel, Marc E et al. (2016) Recommendations for real-time speech MRI. J Magn Reson Imaging 43:28-44
Li, Ming; Kim, Jangwon; Lammert, Adam et al. (2016) Speaker verification based on the fusion of speech acoustics and inverted articulatory signals. Comput Speech Lang 36:196-211
Kim, Jangwon; Kumar, Naveen; Tsiartas, Andreas et al. (2015) Automatic intelligibility classification of sentence-level pathological speech. Comput Speech Lang 29:132-144

Showing the most recent 10 out of 40 publications