In fluent speech, speakers begin to pronounce the next sound before they're done pronouncing the last. As a result, speech sounds not only occur right next to one another but actually overlap, and pauses only occur between whole phrases and not individual sounds. This characteristic of fluent speech presents the listener with two formidable problems: separating overlapping sounds and then recognizing sounds whose acoustics have been distorted by the overlap with its neighbors' pronunciations. This proposal pursues the hypothesis that both separation and recognition can happen because successive intervals in the signal contrast with one another perceptually. For example, after an interval in which most of the sound energy is at high frequencies, a sound whose energy is at mid frequencies will sound relatively low, or after a relatively long interval, an interval of intermediate duration will sound relatively short. The experiments test a version of this hypothesis in which sequential contrast is exaggerated like this in the initial auditory evaluation of the sounds, before the listener has assigned any linguistic value to the sound, i.e. before the sounds are recognized as instances of particular categories. If sequential contrast arises before the sounds are recognized, then it will be impervious to any linguistic knowledge the listener may have, e.g. of whether the current sound makes a word with its context, occurs frequently in that context, is phonotactically legal in that context, etc. A separate, prelinguistic, auditory stage of phonetic processing is diagnosed by better discrimination of sound sequences that differ in the direction of their sequential contrast, e.g. high-low vs low-high, than of sequences that don't, i.e. high-high vs low-low. If linguistic knowledge is used at all stages of processing, these two pairs of sequences should instead be equally easy to distinguish because all the intervals will have been assigned to categories and will therefore be equally different. The results of these experiments therefore permit a choice between interactive models of speech sound recognition in which listeners use their linguistic knowledge at all stages in processing the speech sounds they hear and autonomous models in which they use only the psychoacoustic properties of the signal during the first stage, and only later apply what they know linguistically to the output of that stage. If the autonomous model is supported, then the robustness of speech perception under adverse conditions or by impaired listeners can be improved more by enhancing signal quality than adding redundant linguistic information.

Agency
National Institute of Health (NIH)
Institute
National Institute on Deafness and Other Communication Disorders (NIDCD)
Type
Research Project (R01)
Project #
5R01DC006241-03
Application #
7086399
Study Section
Language and Communication Study Section (LCOM)
Program Officer
Shekim, Lana O
Project Start
2004-07-01
Project End
2007-12-31
Budget Start
2006-07-01
Budget End
2007-12-31
Support Year
3
Fiscal Year
2006
Total Cost
$268,057
Indirect Cost
Name
University of Massachusetts Amherst
Department
Psychology
Type
Schools of Arts and Sciences
DUNS #
153926712
City
Amherst
State
MA
Country
United States
Zip Code
01003
Kingston, John; Kawahara, Shigeto; Chambless, Della et al. (2014) Context effects as auditory contrast. Atten Percept Psychophys 76:1437-64
Breen, Mara; Kingston, John; Sanders, Lisa D (2013) Perceptual representations of phonotactically illegal syllables. Atten Percept Psychophys 75:101-20
Kingston, John; Kawahara, Shigeto; Mash, Daniel et al. (2011) Auditory contrast versus compensation for coarticulation: data from Japanese and English listeners. Lang Speech 54:499-525
Kingston, John; Kawahara, Shigeto; Chambless, Della et al. (2009) Contextual Effects on the Perception of Duration. J Phon 37:297-320
Kingston, John; Diehl, Randy L; Kirk, Cecilia J et al. (2008) On the internal perceptual structure of distinctive features: The [voice] contrast. J Phon 36:28-54