The long-term goal of this project is to wed state-of-the-art technology for imaging the vocal tract with a linguistically informed analysis of dynamic vocal tract constriction actions in order to understand the control and production of the compositional units of spoken language. We have pioneered the use of real time MRI for speech imaging to illuminate articulatory dynamics and to understand how these emerge lawfully from the combined effects of vocal tract constriction events distributed over space (subparts of the tract) and over time. This project has developed and refined a novel real time MRI acquisition ability, making possible current reconstruction rates of up to 96 frames per second, quadrupling current imaging speeds. Data show clear real- time movements of the lips, tongue, velum and epiglottis, providing exquisite information about the spatiotemporal properties of speech gestures in both the oral and pharyngeal portions of the vocal tract. The project has also developed novel noise-mitigated image-synchronized strategies to record speech in-situ during imaging, as well as image processing strategies for deriving linguistically meaningful measures from the data, demon- strating the utility of this approach for linguistic studies of speech communication in a variety of languages. Using our direct access to dynamic information on vocal tract shaping, we investigate vocal tract shaping in three-dimensions as the composition of spatiotemporally coordinated vocal tract action units. This project's specific aims go beyond the dynamic shaping of individual vowels and consonants-postures over time-to examine more complex structuring of articulation-namely, the local and global influences governing linguistic control, temporal coherence and multi-unit coordination. The advances in our technical approach enable a series of studies that leverage: (i) unprecedented high-speech imaging with dynamic rtMRI to consider the prosodic modulation of temporally rapid and temporally coherent speech units; (ii) innovative multi-plane 3D imaging capability to inform the computational identification of linguistic control regimes; and (iii) a large- scale rtMRI corpus ad concomitant machine learning advances to move toward a principled account of system-level co-variability in space and time, both within and among individuals. This symbiotic theory-driven and data-driven research strategy will yield significant innovations in understanding spoken communication. It is no exaggeration to say that the advent of real-time MRI for speech has initiated a dramatic scientific change in the nature of speech production research by allowing for models of production driven by rich quantitative articulatory data. The project is having broad impact through the free dissemination of the unique rtMRI data corpora, tools and models-already used worldwide for research and teaching-and societal out- reach through its website and lay media coverage. Understanding articulatory compositional structure and cross-linguistic potentialities also has critical translational significance impacting the assessment and remediation of speech disorders, as our collaborative work on glossectomy and apraxia has begun to demonstrate.
Real-time imaging of the moving vocal tract with MRI has made direct movies of speech production possible, allowing an investigation of the articulatory composition of speech in healthy adults and illuminating the articulatory dissolution and lack of coherence often found in spoken language disorders. This technology platform, coupled with a linguistically driven theoretical framework that understands speech as composed of articulatory units, provides a scientific foothold for evidence-driven assessment and remediation of speech breakdown in clinical populations, including articulatory remediation and training and deploying assistive technologies for the impaired (automatic speech recognition, machine speech synthesis), and has potential broad impact on the clinical needs of those with swallowing disorders, sleep apnea, or facing recovery of speech function after stroke or surgery. Further, because speech presents the only example of rapid, cognitively- controlled, internal movements of the body, the unique challenges of speech production imaging offer the wider biomedical imaging community traction for advances that improve temporal and spatial image resolution-advances with potential import for cardiac and other imaging.
|Lim, Yongwan; Zhu, Yinghua; Lingala, Sajan Goud et al. (2018) 3D dynamic MRI of the vocal tract during natural speech. Magn Reson Med :|
|Lammert, Adam C; Shadle, Christine H; Narayanan, Shrikanth S et al. (2018) Speed-accuracy tradeoffs in human speech production. PLoS One 13:e0202180|
|Vaz, Colin; Ramanarayanan, Vikram; Narayanan, Shrikanth (2018) Acoustic Denoising using Dictionary Learning with Spectral and Temporal Regularization. IEEE/ACM Trans Audio Speech Lang Process 26:967-980|
|Parrell, Benjamin; Narayanan, Shrikanth (2018) Explaining Coronal Reduction: Prosodic Structure and Articulatory Posture. Phonetica 75:151-181|
|Gupta, Rahul; Audhkhasi, Kartik; Jacokes, Zach et al. (2018) Modeling multiple time series annotations as noisy distortions of the ground truth: An Expectation-Maximization approach. IEEE Trans Affect Comput 9:76-89|
|Lingala, Sajan Goud; Zhu, Yinghua; Lim, Yongwan et al. (2017) Feasibility of through-time spiral generalized autocalibrating partial parallel acquisition for low latency accelerated real-time MRI of speech. Magn Reson Med 78:2275-2282|
|Hagedorn, Christina; Proctor, Michael; Goldstein, Louis et al. (2017) Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging. J Speech Lang Hear Res 60:877-891|
|Töger, Johannes; Sorensen, Tanner; Somandepalli, Krishna et al. (2017) Test-retest repeatability of human speech biomarkers from static and real-time dynamic magnetic resonance imaging. J Acoust Soc Am 141:3323|
|Lingala, Sajan Goud; Zhu, Yinghua; Kim, Yoon-Chul et al. (2017) A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magn Reson Med 77:112-125|
|Ramanarayanan, Vikram; Van Segbroeck, Maarten; Narayanan, Shrikanth S (2016) Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories. Comput Speech Lang 36:330-346|
Showing the most recent 10 out of 45 publications