This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
Despite large acoustic differences in the speech of various talkers, humans are generally able to understand each other quickly and easily. The mechanisms by which humans map such variability onto a set of phonemes has been the subject of research for more than 50 years. This "speaker normalization" problem has generally been thought of in terms of normalizing the formant frequencies of a particular speaker with a reference set of formants. In this project, a novel approach to speaker normalization is explored, in which not formants but subglottal resonances (SGRs) are normalized. SGRs have previously been shown to define a set of frequency bands within which formants may vary, yet retaining the same phonemic vowel quality. Normalizing SGRs (and associated frequency bands) therefore reduces formant variability in an effective way. In this project, effects of SGR normalization on automatic speech recognition (ASR) performance are evaluated for both adult and child speakers of English and Spanish. In parallel, effects on human speech perception in multi-talker conditions are explored. Results are expected to improve ASR performance and shed light on human speech production and perception. The project will result in speech databases (including direct recordings of SGR acoustics) and ASR tools, which are critically useful for research in speech production, perception, speaker identification, and speech processing algorithms for cochlear implants and multi-lingual ASR. The collaboration in Engineering, Linguistics, Speech & Hearing, and Psychology facilitates a multidisciplinary learning environment. Publications, results, databases, and tools will be disseminated to the research community.