Numerous practical and theoretical problems could be addressed if we had a better understanding of the auditory mechanisms underlying phonetic recognition. This proposal is aimed at improving our understanding of these mechanisms, with a particular focus on vowel perception. Although there is a long tradition of representing vowels by the spectral pattern sampled at a single time slice, a growing body of literature suggests that dynamic properties play an important role in vowel identification. Despite this literature, relatively little is known about the precise mechanisms that are involved in mapping dynamic spectral cues onto perceived vowel quality. Some of the proposed experiments will test specific hypotheses about the way in which this mapping might occur. The experiments will make use of a large database consisting of vowels spoken by 150 talkers (men, women, and children). Measurements of fundamental frequency (F0) and formant contours from these signals will be used in a series of experiments designed to determine the role played by F0, vowel duration, and spectral change in vowel identification. Specific hypotheses will be tested by: (1) acoustic analysis of tokens in the 150-talker database, and (2) listening tests involving various kinds of stimuli resynthesized from these tokens. A second goal of this project is to evaluate the """"""""Masked Peak Representation"""""""" (MPR), a new method of representing speech which was developed as an alternative to both traditional formant representations and """"""""whole spectrum"""""""" representations. Formant representations are widely used because these they can account for a relatively large number of findings in phonetic perception. The principal weakness of formant theory is that tracking formants in natural speech is a difficult and essentially unresolved problem. Largely in response to this problem, some investigators have proposed a whole spectrum approach in which phonetic quality is controlled by overall spectral shape. The whole spectrum approach, however, cannot account for very convincing data showing that judgments of phonetic quality are affected primarily by the frequencies of spectral peaks, and relatively unaffected by spectral shape details in nonpeak regions. The MPR was designed to retain maximal sensitivity to spectral peaks but without requiring the explicit tracking of formants. The basic idea behind the MPR is to: (1) obtain a pitch-independent spectrum through cepstral smoothing, (2) stimulate nonlinear auditory frequency coding by computing a bark-scale transform, and (3) simulate lateral suppression by subtracting a running average of spectral values. The resulting """"""""masked spectrum"""""""" retains spectral peaks but removes most other spectral shape details. The MPR will be evaluated with: (1) an experiment comparing MPR-based predictions of perceived phonetic distance with those of a more traditional auditory model, (2) speech recognition tests that use a Hidden Markov Model to map sequences of MPR spectra onto either words of phonetic segments, and (3) listening tests with speech resynthesized from MPR spectra.

Agency
National Institute of Health (NIH)
Institute
National Institute on Deafness and Other Communication Disorders (NIDCD)
Type
Research Project (R01)
Project #
1R01DC001661-01
Application #
3218255
Study Section
Sensory Disorders and Language Study Section (CMS)
Project Start
1992-07-01
Project End
1996-06-30
Budget Start
1992-07-01
Budget End
1993-06-30
Support Year
1
Fiscal Year
1992
Total Cost
Indirect Cost
Name
Western Michigan University
Department
Type
Schools of Allied Health Profes
DUNS #
City
Kalamazoo
State
MI
Country
United States
Zip Code
49008
Hillenbrand, James M; Gayvert, Robert T; Clark, Michael J (2015) Phonetics exercises using the Alvin experiment-control software. J Speech Lang Hear Res 58:171-84
Hillenbrand, James M; Clark, Michael J; Baer, Carter A (2011) Perception of sinewave vowels. J Acoust Soc Am 129:3991-4000
Hillenbrand, James M; Clark, Michael J (2009) The role of f (0) and formant frequencies in distinguishing the voices of men and women. Atten Percept Psychophys 71:1150-66
Hillenbrand, James M; Gayvert, Robert T (2005) Open source software for experiment design and control. J Speech Lang Hear Res 48:45-60
de Wet, Febe; Weber, Katrin; Boves, Louis et al. (2004) Evaluation of formant-like features on an automatic vowel classification task. J Acoust Soc Am 116:1781-92
Hillenbrand, James M; Houde, Robert A (2003) A narrow band pattern-matching model of vowel perception. J Acoust Soc Am 113:1044-55
Kardach, Jill; Wincowski, Robert; Metz, Dale Evan et al. (2002) Preservation of place and manner cues during simultaneous communication: a spectral moments perspective. J Commun Disord 35:533-42
Hillenbrand, James M; Houde, Robert A (2002) Speech synthesis using damped sinusoids. J Speech Lang Hear Res 45:639-50
Hillenbrand, J M; Clark, M J; Nearey, T M (2001) Effects of consonant environment on vowel formant patterns. J Acoust Soc Am 109:748-63
Hillenbrand, J M; Clark, M J; Houde, R A (2000) Some effects of duration on vowel recognition. J Acoust Soc Am 108:3013-22

Showing the most recent 10 out of 18 publications