The long-term goal of this project is to wed state-of-the-art technology for imaging the vocal tract with a linguistically informed analysis of vocal tract constriction actions in order to understand the cognitive control and production of the compositional units of spoken language. We have developed the use of real time MRI to illuminate the inherently dynamic speech production process. Our approach is to observe the time-varying changes in vocal tract shaping and to understand how these emerge lawfully from the combined effects of multiple constriction events distributed over space (subparts of the tract) and over time. An understanding of dynamic vocal tract actions as fundamental to linguistic organization will do much to add to the field's current-basically static-approach to describing speech. In the previous (and first) funding period of the proposal, our team developed and refined our novel real time MRI acquisition ability that has made veridical real-time movies of speech production possible for the first time without X-rays. Data show clear real-time movements of the lips, tongue, and velum, providing exquisite information about the spatiotemporal properties of speech gestures in both the oral and pharyngeal portions of the vocal tract. We also developed novel noise-mitigated image-synchronized strategies to record speech in-situ during imaging as well as signal processing strategies for deriving linguistically-meaningful measures from the data (Bresh and Narayanan, 2009). We have demonstrated the utility of this approach for linguistic studies of speech communication that were hitherto not possible (e.g., Byrd, Tobin, Bresch, Narayanan, 2009;Bresch et al, 2008). Building on these foundational efforts, we situate the specific research aims of our competing renewal proposal as follows.
The specific aims of this proposal are to further develop the technology and analysis platform of real-time MRI, which provides the scaffolding for the project, while pursuing speech production studies with an overarching theme of examining the decomposition of speech into cognitively-controlled action units, or gestures. Specifically, we aim to investigate the compositionality of speech in three domains-each being areas of study that are not approachable using exclusively acoustic speech data without direct access to the dynamic information from the entire vocal tract, which can only be supplied with real-time MRI.
Our specific aims examine (i) compositionality in space: deployment of concurrent constriction events distributed spatially, that is, over distinct constriction effectors within the vocal tract, (ii) compositionality in time: deployment of constriction events distributed temporally, (iii) compositionality in cognition: deployment of constriction events during speech planning that mirror those observed during speech production. We propose to use the real-time MRI approach we've developed to advance our understanding in all these three aspects of linguistic structuring. Our approach to decomposing speech shaping into multiple discrete events, in space and over time, can be further validated by demonstrating that we can capture the observed data time-functions using a computational model having only discrete gestural input. To do this, we will employ a computational implementation of Articulatory Phonology and Task Dynamics (called TaDA). The model is particularly appropriate because it provides a hypothesized ensemble of gestures arrayed over time for any input utterance. The model is biologically plausible and produces as its output explicit time-functions of constriction events in the vocal tract, which is precisely what we measure directly with real-time MRI. We anticipate a highly synergistic relation between model and data that can bootstrap our understanding of the structure of speech. The model has not, to this point, been optimized using real data, as the appropriate data did not exist before real-time MRI. And, in turn, the use of real-time MRI as a tool in understanding speech depends on having an analytical procedure for relating the observed shaping changes to underlying (multiple) controls, which is what the model provides. The project's final specific aim is to continue to advance our technical real-time MRI approach for investigating the physical realization of phonological structure by: (i) improved image signal to noise ratio through the use of a novel custom 16 receiver head neck coil, (ii) doubling the 2D acquisition frame rate through the use of novel pulse sequences in conjunction with new joint acquisition-processing optimization, and (iii) fast 3D imaging using more sophisticated pulse-sequences to supplement the single plane fast imaging work. These challenges will be pursued in tandem with the design of data-driven analyses suitable for distilling the high-dimensional information provided by real-time MRI and with the synchronized acoustic speech signal, critical for deriving linguistically-meaningful measures. Specifically, we pursue robust and faster image segmentation and articulatory tracking, and methods for dynamical modeling using the derived time series constriction data.

Public Health Relevance

The vocal tract is the universal human instrument, played with great dexterity and skill in the production of spoken language. In order to produce the elegant acoustic structure of speech, the linguistically significant actions of the vocal tract must be choreographed with remarkable spatiotemporal precision. The vocal tract airway is also critically involved in functions such as swallowing and breathing. Disruptions to speech and other airway function can have significant effects on the health, well-being, and overall quality of life of individuals. The proposed effort's theoretical, experimental, and methodological approaches focusing on the dynamics of vocal tract shaping are hence significant along several dimensions. The unique capability our team has created to allow direct imaging of the moving vocal tract with MRI, with reconstruction rates of up to 24 images per second with synchronized audio recording, has made veridical real-time movies of speech production possible for the first time without X-rays. The present proposal aims to further develop the technology and analysis platform of real-time MRI while pursuing speech production studies with the overarching linguistic goal of understanding the composition of speech from cognitively-controlled action units, or gestures. Specifically, we aim to investigate the compositionality of speech in three domains-in space, in time, and in cognition-each being an area of study not approachable using exclusively acoustic speech data, because the question of compositionality requires direct access to dynamic information about articulation along the entire vocal tract, which can only be supplied with real-time MRI. In addition to illuminating details of unimpaired speech production, the proposed work provides both technological tools and theoretical tools to look at clinical disorders in a new way. In disordered speech it is often critical to have direct articulatory data to accurately describe the spoken language deficit. Further, the theoretical framework that pursues an understanding of speech as composed of cognitively-planned action units creates a scientific foothold for evaluating the dissolution and lack of coherence commonly found in disordered speech articulation. Beyond speech production studies, the work has potential broad impact on clinical applications such as those related to swallowing disorders, sleep apnea, and recovery of speech function after stroke or surgery, e.g., glossectomy. Further, because speech presents the only example of rapid, cognitively-controlled, internal movements of the body, the unique challenges of speech production imaging offer the wider biomedical imaging community traction for advances that have already improved temporal and spatial image resolution;advances with potential import for cardiac and other imaging. Scientific knowledge of the orchestration of articulatory activity that creates speech is a necessary element in understanding the human communication process. And we feel that it is no exaggeration to say that the advent of real-time MRI for speech has initiated a dramatic change in the way speech production research is conducted.

National Institute of Health (NIH)
National Institute on Deafness and Other Communication Disorders (NIDCD)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BBBP-D (03))
Program Officer
Shekim, Lana O
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Southern California
Engineering (All Types)
Schools of Engineering
Los Angeles
United States
Zip Code
Ramanarayanan, Vikram; Van Segbroeck, Maarten; Narayanan, Shrikanth S (2016) Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories. Comput Speech Lang 36:330-346
Lingala, Sajan Goud; Sutton, Brad P; Miquel, Marc E et al. (2016) Recommendations for real-time speech MRI. J Magn Reson Imaging 43:28-44
Toutios, Asterios; Narayanan, Shrikanth S (2016) Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research. APSIPA Trans Signal Inf Process 5:
Kim, Jangwon; Kumar, Naveen; Tsiartas, Andreas et al. (2015) Automatic intelligibility classification of sentence-level pathological speech. Comput Speech Lang 29:132-144
Lammert, Adam C; Narayanan, Shrikanth S (2015) On Short-Time Estimation of Vocal Tract Length from Formant Frequencies. PLoS One 10:e0132193
Lammert, Adam; Goldstein, Louis; Ramanarayanan, Vikram et al. (2014) Gestural Control in the English Past-Tense Suffix: An Articulatory Study Using Real-Time MRI. Phonetica 71:229-48
Bone, Daniel; Li, Ming; Black, Matthew P et al. (2014) Intoxicated Speech Detection: A Fusion Framework with Speaker-Normalized Hierarchical Functionals and GMM Supervectors. Comput Speech Lang 28:
Narayanan, Shrikanth; Toutios, Asterios; Ramanarayanan, Vikram et al. (2014) Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). J Acoust Soc Am 136:1307
Ramanarayanan, Vikram; Lammert, Adam; Goldstein, Louis et al. (2014) Are articulatory settings mechanically advantageous for speech motor control? PLoS One 9:e104168
Kim, Jangwon; Lammert, Adam C; Ghosh, Prasanta Kumar et al. (2014) Co-registration of speech production datasets from electromagnetic articulography and real-time magnetic resonance imaging. J Acoust Soc Am 135:EL115-21

Showing the most recent 10 out of 33 publications