This project will address fundamental questions about the nature of the link between producing and perceiving speech. Existing experimental evidence points to the automatic and involuntary activation of the speech production system during speech perception. One goal of this project is to investigate what properties of speech are important in the perception-production link, a question that has not been systematically addressed. Speech production involves spatial properties, which speech articulators make what constrictions where in the vocal tract, and temporal properties, how those constrictions are arranged relative to one another in time. A series of cue-distractor experiments will be conducted to elucidate the conditions under which perception-production interactions occur in speech. Such interactions will be detected by measuring participants' reaction times to syllables produced (the cue) while another syllable is being perceived (the distractor). Subjects' reaction times will be compared in cases where the cue and distractor share spatial properties (for example, "pa" and "ba", which both involve a closing of the upper and lower lips) or temporal properties (for example, "ba" and "da", which both involve similar timing between the oral closure and vowel voicing). By analyzing the reaction times in such tasks, we will know if and how the involuntary activation resulting from the perceived distractor combines with the intended response to the visual cue. A second goal of the project is to develop a formal computational model of the perception-production link to account for and unify existing and new evidence for this link.

Action and perception are known to be closely linked in behavior. This study investigates a specific instance of the link between action and perception in speech, thereby sharpening our understanding of the mechanisms underlying human communication. The project will provide an explicit computational model of the link between speech perception and production in specific experimental paradigms involving reaction time data. Such a model has not been developed despite many years of work on the topic of production-perception links. This model will be of value to the field because it will help to refine further predictions, and thus to design new experiments for sharpening our understanding of perceptuo-motor interactions in speech. A better understanding of the relation between perception and production may shed light on the ways in which disorders of one domain may be related to the disorders of the other. For example, individuals with non-fluent aphasia have difficulty with speech production but generally not with speech perception, and they have been shown to improve on certain production tasks through multi-sensory speech perception training. This research therefore has potential long-term benefits for revealing clinically relevant insights into the mechanisms of new treatment approaches. In addition, the methods used to collect and analyze response time data along with the computational modeling component in this project will add new important components in an existing training framework introducing undergraduates to integrative methods in theoretical and experimental linguistics.

Project Report

This project addressed fundamental questions about the nature of the link between producing and perceiving speech. Experimental evidence points to the automatic and involuntary activation of the speech production system during speech perception. One such type of evidence comes from response times of subjects when they are instructed to say some syllable based on a visual cue. As they are preparing to speak the required syllable, they hear a distractor. Subjects’ spoken responses start sooner when the auditory distractor is the same as their response, and later when it is different. One goal of this project was to investigate what properties of speech are important in the perception-production link. The literature has raised this question but not addressed it systematically. Speech production involves spatial properties, i.e., which speech articulators make what constrictions where in the vocal tract, and temporal properties, i.e., how those constrictions are arranged in time. We conducted experiments to elucidate the conditions under which perception-production interactions occur in speech, detected by measuring response times of syllables produced while another syllable was perceived. Subjects’ response times were compared in cases where the response and distractor shared spatial properties (e.g., "pa" and "ba" both involve a closing of the upper and lower lips) or temporal properties ("ba" and "da" both involve similar timing between the oral closure and vowel voicing). The experiments showed that both the spatial and temporal properties of speech play a role in the perception-production link. In one experiment, we found that subjects responded faster when the response and distractor matched in temporal properties than when they differed in them. In another, we found qualitatively similar results for spatial properties. This project provides the first concrete experimental evidence that the same properties of speech that are fundamental in traditional linguistic description are also active in the link between speech perception and production. Another goal of the project was to develop a formal computational model of the perception-production link to account for disparate sets of results that have been used as evidence of this link. The model we developed focuses on the process by which speech properties are set during production. The perception-production link is formalized as the properties of a perceived utterance serving obligatorily as input to that property-setting process. In our experiments, response times were affected by whether the response and distractor (mis)matched on the properties we tested, but response times in both cases were slower than when subjects heard a tone distractor. Galantucci, Fowler, and Goldstein (2009) found that when the response and distractor were identical, response times were faster than with a tone distractor. Treating the tone as a baseline, we argue that the process of property setting requires both excitation and inhibition of the activation levels associated with the speech properties. When the response and distractor are identical, activation-level excitation is maximized due to fully congruent inputs to the property-setting process. The model predicts response times to be faster in this case than with a tone distractor, as Galantucci et al. found. Each mismatching property between a response and a distractor introduces inhibition to the process, predicting increasingly slower response times based on the number of mismatching properties, as we found in our experiments. Our model provides a unified account of results from these two different studies, formalizes the perception-production link, and identifies the computational principles that are involved in the process where that link is active. Action and perception are known to be closely linked in behavior. This study investigated a specific instance of the link between action and perception in speech, thereby sharpening our understanding of the mechanisms underlying human communication. The project has lead to the development of an explicit model of the perception-production link in speech in a specific experimental paradigm. Such a model had not been developed despite years of literature on production-perception links. This model is valuable to the field because it provides a unified account of new and previous experimental results. It has refined further predictions beyond those tested in the project, leading to the design of new experiments to sharpen our understanding of perception-production interactions in speech. A better understanding of these interactions may shed light on how disorders of one domain may be related to the other. For example, individuals with non-fluent aphasia have difficulty with speech production but generally not with speech perception, and they have been shown to improve on certain production tasks through multi-sensory speech perception training. This research therefore has potential long-term benefits for revealing clinically relevant insights into the mechanisms of new treatment approaches. In addition, the methods used to collect and analyze response time data along with the computational modeling component in this project have added new important components in an existing training framework introducing undergraduates to integrative methods in theoretical and experimental linguistics.

Agency
National Science Foundation (NSF)
Institute
Division of Behavioral and Cognitive Sciences (BCS)
Type
Standard Grant (Standard)
Application #
0951831
Program Officer
William J. Badecker
Project Start
Project End
Budget Start
2010-03-15
Budget End
2012-08-31
Support Year
Fiscal Year
2009
Total Cost
$4,778
Indirect Cost
Name
New York University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10012