One of the fundamental challenges for communication by speech is the variability in speech production/acoustics. Talkers vary in the size and shape of their vocal tract, in dialect, and in speaking mannerisms. These differences all impact the acoustic output. Despite this lack of invariance in the acoustic signal, listeners can correctly perceive the speech of many different talkers. This ability to adapt one's perception to the particular acoustic structure of a talker has been investigated for over fifty years. The prevailing explanation for this phenomenon is that listeners construct talk-specific representations that can serve as referents for subsequent speech sounds. Specifically, it is thought that listeners may either be creating mappings between acoustics and phonemes or extracting the vocal tract anatomy and shape for each individual talker. The proposed research focuses on an alternative explanation, which takes a more general auditory approach. Data from previous studies has indicated that listeners may be calculating an average spectral representation (long term average spectrum - LTAS) of a talker's speech and using that as a referent. This process/representation is not speech-specific but can still accommodate some of the talker-specific variability. In previous work, I have developed a model of perceptual adaptation that relies on the computation of the LTAS. The goal of this project is to further develop and test this model by determining a more accurate estimate of the effective representation of the LTAS and comparing its predictions to those of perceptual learning approaches. In order to accomplish these goals, the time window over which the LTAS is computed by listeners must be determined (Aim #1). The project includes a series of experiments in which preceding context is added to a target sound to determine the effect on categorization of the target. By increasing the duration of the context (and having each duration change the LTAS), I can determine how much of the context is effective in eliciting a perceptual effect. One of the innovations of these studies is that the speech is synthesized using a realistic vocal tract model, allowing acoustic control constrained by realistic articulations. It also allows me to create different """"""""talkers"""""""" with knowable anatomical/articulatory differences. This model will be tested against a traditional approach in predicting the effect on listeners of being exposed to novel """"""""dialects"""""""" or """"""""accents"""""""". Vowel productions will be shifted to produce learnable differences in vowel categorization for listeners, but these shifts will have independent effects on the LTAS of the talker. In this way, I will be able to test which model best explains the perceptual data. The development of such a model will delimit the ability of listeners to accommodate variations due to anatomical differences, accent, dialect and even motor speech disorders. It also provides an indication of what information is important in the signal for adaptive complex sound perception that may be distorted by signal processing in hearing aids and cochlear implants.
This research will provide insight into the processes/representations involved in the ability of listeners to accommodate the variability in speech arising from differences in talker characteristics including anatomy, speaking style, accent and motor disability. Current hearing aid and cochlear implant systems can disrupt some of this information that could be critical for robust speech perception. The results could impact the future development of these hearing devices, as well as strategies for improving intelligibility.