The long-term goal of this project is to wed state-of-the-art technology for imaging the vocal tract with a linguistically informed analysis of vocal tract constriction actions in order to understand the cognitive control and production of the compositional units of spoken language. We have developed the use of real time MRI to illuminate the inherently dynamic speech production process. Our approach is to observe the time-varying changes in vocal tract shaping and to understand how these emerge lawfully from the combined effects of multiple constriction events distributed over space (subparts of the tract) and over time. An understanding of dynamic vocal tract actions as fundamental to linguistic organization will do much to add to the field's current-basically static-approach to describing speech. In the previous (and first) funding period of the proposal, our team developed and refined our novel real time MRI acquisition ability that has made veridical real-time movies of speech production possible for the first time without X-rays. Data show clear real-time movements of the lips, tongue, and velum, providing exquisite information about the spatiotemporal properties of speech gestures in both the oral and pharyngeal portions of the vocal tract. We also developed novel noise-mitigated image-synchronized strategies to record speech in-situ during imaging as well as signal processing strategies for deriving linguistically-meaningful measures from the data (Bresh and Narayanan, 2009). We have demonstrated the utility of this approach for linguistic studies of speech communication that were hitherto not possible (e.g., Byrd, Tobin, Bresch, Narayanan, 2009;Bresch et al, 2008). Building on these foundational efforts, we situate the specific research aims of our competing renewal proposal as follows.
The specific aims of this proposal are to further develop the technology and analysis platform of real-time MRI, which provides the scaffolding for the project, while pursuing speech production studies with an overarching theme of examining the decomposition of speech into cognitively-controlled action units, or gestures. Specifically, we aim to investigate the compositionality of speech in three domains-each being areas of study that are not approachable using exclusively acoustic speech data without direct access to the dynamic information from the entire vocal tract, which can only be supplied with real-time MRI.
Our specific aims examine (i) compositionality in space: deployment of concurrent constriction events distributed spatially, that is, over distinct constriction effectors within the vocal tract, (ii) compositionality in time: deployment of constriction events distributed temporally, (iii) compositionality in cognition: deployment of constriction events during speech planning that mirror those observed during speech production. We propose to use the real-time MRI approach we've developed to advance our understanding in all these three aspects of linguistic structuring. Our approach to decomposing speech shaping into multiple discrete events, in space and over time, can be further validated by demonstrating that we can capture the observed data time-functions using a computational model having only discrete gestural input. To do this, we will employ a computational implementation of Articulatory Phonology and Task Dynamics (called TaDA). The model is particularly appropriate because it provides a hypothesized ensemble of gestures arrayed over time for any input utterance. The model is biologically plausible and produces as its output explicit time-functions of constriction events in the vocal tract, which is precisely what we measure directly with real-time MRI. We anticipate a highly synergistic relation between model and data that can bootstrap our understanding of the structure of speech. The model has not, to this point, been optimized using real data, as the appropriate data did not exist before real-time MRI. And, in turn, the use of real-time MRI as a tool in understanding speech depends on having an analytical procedure for relating the observed shaping changes to underlying (multiple) controls, which is what the model provides. The project's final specific aim is to continue to advance our technical real-time MRI approach for investigating the physical realization of phonological structure by: (i) improved image signal to noise ratio through the use of a novel custom 16 receiver head neck coil, (ii) doubling the 2D acquisition frame rate through the use of novel pulse sequences in conjunction with new joint acquisition-processing optimization, and (iii) fast 3D imaging using more sophisticated pulse-sequences to supplement the single plane fast imaging work. These challenges will be pursued in tandem with the design of data-driven analyses suitable for distilling the high-dimensional information provided by real-time MRI and with the synchronized acoustic speech signal, critical for deriving linguistically-meaningful measures. Specifically, we pursue robust and faster image segmentation and articulatory tracking, and methods for dynamical modeling using the derived time series constriction data.
The vocal tract is the universal human instrument, played with great dexterity and skill in the production of spoken language. In order to produce the elegant acoustic structure of speech, the linguistically significant actions of the vocal tract must be choreographed with remarkable spatiotemporal precision. The vocal tract airway is also critically involved in functions such as swallowing and breathing. Disruptions to speech and other airway function can have significant effects on the health, well-being, and overall quality of life of individuals. The proposed effort's theoretical, experimental, and methodological approaches focusing on the dynamics of vocal tract shaping are hence significant along several dimensions. The unique capability our team has created to allow direct imaging of the moving vocal tract with MRI, with reconstruction rates of up to 24 images per second with synchronized audio recording, has made veridical real-time movies of speech production possible for the first time without X-rays. The present proposal aims to further develop the technology and analysis platform of real-time MRI while pursuing speech production studies with the overarching linguistic goal of understanding the composition of speech from cognitively-controlled action units, or gestures. Specifically, we aim to investigate the compositionality of speech in three domains-in space, in time, and in cognition-each being an area of study not approachable using exclusively acoustic speech data, because the question of compositionality requires direct access to dynamic information about articulation along the entire vocal tract, which can only be supplied with real-time MRI. In addition to illuminating details of unimpaired speech production, the proposed work provides both technological tools and theoretical tools to look at clinical disorders in a new way. In disordered speech it is often critical to have direct articulatory data to accurately describe the spoken language deficit. Further, the theoretical framework that pursues an understanding of speech as composed of cognitively-planned action units creates a scientific foothold for evaluating the dissolution and lack of coherence commonly found in disordered speech articulation. Beyond speech production studies, the work has potential broad impact on clinical applications such as those related to swallowing disorders, sleep apnea, and recovery of speech function after stroke or surgery, e.g., glossectomy. Further, because speech presents the only example of rapid, cognitively-controlled, internal movements of the body, the unique challenges of speech production imaging offer the wider biomedical imaging community traction for advances that have already improved temporal and spatial image resolution;advances with potential import for cardiac and other imaging. Scientific knowledge of the orchestration of articulatory activity that creates speech is a necessary element in understanding the human communication process. And we feel that it is no exaggeration to say that the advent of real-time MRI for speech has initiated a dramatic change in the way speech production research is conducted.
|Lim, Yongwan; Zhu, Yinghua; Lingala, Sajan Goud et al. (2018) 3D dynamic MRI of the vocal tract during natural speech. Magn Reson Med :|
|Lammert, Adam C; Shadle, Christine H; Narayanan, Shrikanth S et al. (2018) Speed-accuracy tradeoffs in human speech production. PLoS One 13:e0202180|
|Vaz, Colin; Ramanarayanan, Vikram; Narayanan, Shrikanth (2018) Acoustic Denoising using Dictionary Learning with Spectral and Temporal Regularization. IEEE/ACM Trans Audio Speech Lang Process 26:967-980|
|Parrell, Benjamin; Narayanan, Shrikanth (2018) Explaining Coronal Reduction: Prosodic Structure and Articulatory Posture. Phonetica 75:151-181|
|Gupta, Rahul; Audhkhasi, Kartik; Jacokes, Zach et al. (2018) Modeling multiple time series annotations as noisy distortions of the ground truth: An Expectation-Maximization approach. IEEE Trans Affect Comput 9:76-89|
|Lingala, Sajan Goud; Zhu, Yinghua; Lim, Yongwan et al. (2017) Feasibility of through-time spiral generalized autocalibrating partial parallel acquisition for low latency accelerated real-time MRI of speech. Magn Reson Med 78:2275-2282|
|Hagedorn, Christina; Proctor, Michael; Goldstein, Louis et al. (2017) Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging. J Speech Lang Hear Res 60:877-891|
|Töger, Johannes; Sorensen, Tanner; Somandepalli, Krishna et al. (2017) Test-retest repeatability of human speech biomarkers from static and real-time dynamic magnetic resonance imaging. J Acoust Soc Am 141:3323|
|Lingala, Sajan Goud; Zhu, Yinghua; Kim, Yoon-Chul et al. (2017) A fast and flexible MRI system for the study of dynamic vocal tract shaping. Magn Reson Med 77:112-125|
|Ramanarayanan, Vikram; Van Segbroeck, Maarten; Narayanan, Shrikanth S (2016) Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories. Comput Speech Lang 36:330-346|
Showing the most recent 10 out of 45 publications