Adults can recognize between 50,000 and 100,000 words, spoken by different talkers in varying acoustic environments. This ability is remarkable because speech unfolds as a series of transient acoustic events extending over a few hundred ms, without reliable cues to word boundaries. Despite the ubiquitous success of spoken language processing among the general population, 5 percent of first graders enter school with some type of speech sound disorder that cannot be accounted for by hearing impairment. In addition, once the language system has been successfully acquired, it is susceptible to insult from injury or stroke (accounting for 1 million adults in the U.S. with some form of aphasia). Spoken word recognition plays a central role in language acquisition and spoken language comprehension, allowing for storage of the rich array of syntactic, semantic and pragmatic knowledge that is linked to lexical representations and rapid access to this information during comprehension. A more complete understanding of the perceptual and computational capacities underlying spoken word recognition is essential to advancing understanding of both normal and deviant language acquisition and processing. Because the speech signal unfolds over time and the acoustic realization of a word varies with its local environment, it is important to evaluate spoken word recognition at a fine temporal grain using words embedded in continuous speech. This project has established "visual world" eye tracking as a powerful tool for examining spoken word recognition. The methods that we have developed are increasingly being used to address questions about spoken language processing in participant populations across the lifespan from infants to older adults, and in normal and impaired populations. The proposed research has two aims.
The first aim i s to evaluate a "data explanation" framework in which processing words in continuous speech is modulated by expectations based on context which (a) affect how listeners interpret the input and (b) provide a mechanism for rapid perceptual learning/adaptation. We manipulate speech rate and discourse-based information structure to examine how expectations affect real-time integration of asynchronous cues and how cue-weights are adjusted through perceptual learning.
The second aim focuses on three emerging questions that affect the design, interpretation and analysis of visual world experiments: (1) Are the earliest signal-driven eye movements to pictures (at least partially) mediated by phonological information from displayed pictures or are eye-movements primarily mediated by perceptual representations activated by the spoken word;(2) What is the minimal lag between cues in the speech signal and the first stimulus-driven fixations;and (3) Are fixations affected b state-dependencies, and if so, under what conditions, and how can these effects be modeled within an event-based statistical framework.

Public Health Relevance

Narrative Speech unfolds as a series of transient acoustic events without reliable cues to word boundaries. The proposed research uses eye movements to pictures in a display to examine the mechanisms by which listeners rapidly recognize spoken words in continuous speech and adapt to differences among speakers.

National Institute of Health (NIH)
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Research Project (R01)
Project #
Application #
Study Section
Language and Communication Study Section (LCOM)
Program Officer
Miller, Brett
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Rochester
Other Basic Sciences
Schools of Arts and Sciences
United States
Zip Code
Salverda, Anne Pier; Kleinschmidt, Dave; Tanenhaus, Michael K (2014) Immediate effects of anticipatory coarticulation in spoken-word recognition. J Mem Lang 71:145-163
Kurumada, Chigusa; Brown, Meredith; Bibyk, Sarah et al. (2014) Is it or isn't it: listeners make rapid use of prosody to infer speaker meanings. Cognition 133:335-42
Aslin, Richard N (2014) Phonetic Category Learning and Its Influence on Speech Production. Ecol Psychol 26:4-15