Numerous practical and theoretical problems could be addressed if we had a deeper understanding of the auditory and perceptual mechanisms underlying phonetic recognition. Practical applications of this knowledge include improvements to cochlear implant signal processors, the improvement of speech synthesis devices, the development of robust speech recognition algorithms, and the development of training devices for hearing-impaired speakers. The proposed experiments fall into three major categories. An extensive series of experiments is designed to test the validity of a new model of vowel perception that was developed during the previous grant period. The model assumes that vowel identity is recognized by a template-matching process involving the comparison of narrow band input spectra with a set of smoothed spectral-shape templates that are learned through ordinary exposure to speech. An evaluation conducted during the previous grant period showed that the model is capable of recognizing vowels from a large, multitalker database with accuracy approaching that of human listeners. We would like to extend this line of work to address issues such as: (1) modeling the integration of temporal and spectral cues to vowel identity, (2) testing the robustness of the theory to variation in factors such as phonetic environment, signal periodicity, and spectral shape features such as formant amplitude relations, and (3) incorporation of a psychologically plausible normalization scheme that might account for the ability of listeners to recognize speech from talkers with diverse vocal-tract characteristics. A second set of experiments will be run which measure how much phonetic information is transmitted to listeners by speech signals that have been generated by specially designed speech synthesis algorithms that preserve only some characteristics of the original speech signal, while purposely removing or distorting other cues. The results will allow inferences to be drawn about the nature of the underlying spectral representations that mediate phonetic recognition. A third line of work is designed to evaluate a new theory of auditory frequency analysis that was developed during the previous grant period. The theory assumes that the auditory spectrum is calculated not at the periphery but in the central nervous system through an analysis of auditory nerve firing patterns. A software simulation will be developed for use in testing the validity of the theory.
Hillenbrand, James M; Gayvert, Robert T; Clark, Michael J (2015) Phonetics exercises using the Alvin experiment-control software. J Speech Lang Hear Res 58:171-84 |
Hillenbrand, James M; Clark, Michael J; Baer, Carter A (2011) Perception of sinewave vowels. J Acoust Soc Am 129:3991-4000 |
Hillenbrand, James M; Clark, Michael J (2009) The role of f (0) and formant frequencies in distinguishing the voices of men and women. Atten Percept Psychophys 71:1150-66 |
Hillenbrand, James M; Gayvert, Robert T (2005) Open source software for experiment design and control. J Speech Lang Hear Res 48:45-60 |
de Wet, Febe; Weber, Katrin; Boves, Louis et al. (2004) Evaluation of formant-like features on an automatic vowel classification task. J Acoust Soc Am 116:1781-92 |
Hillenbrand, James M; Houde, Robert A (2003) A narrow band pattern-matching model of vowel perception. J Acoust Soc Am 113:1044-55 |
Kardach, Jill; Wincowski, Robert; Metz, Dale Evan et al. (2002) Preservation of place and manner cues during simultaneous communication: a spectral moments perspective. J Commun Disord 35:533-42 |
Hillenbrand, James M; Houde, Robert A (2002) Speech synthesis using damped sinusoids. J Speech Lang Hear Res 45:639-50 |
Hillenbrand, J M; Clark, M J; Nearey, T M (2001) Effects of consonant environment on vowel formant patterns. J Acoust Soc Am 109:748-63 |
Hillenbrand, J M; Clark, M J; Houde, R A (2000) Some effects of duration on vowel recognition. J Acoust Soc Am 108:3013-22 |
Showing the most recent 10 out of 18 publications