Speech perception is one of the most important cognitive operations performed by the human brain and is fundamentally multisensory: when conversing with someone, we use both visual information from their face and auditory information from their voice. Multisensory speech perception is especially important when the auditory component of the speech is noisy, either due to a hearing disorder or normal aging. However, much less is known about the neural computations underlying visual speech perception than about those underlying auditory speech perception. To remedy this gap in existing knowledge, we will use converging evidence from two complementary measures of brain activity, BOLD fMRI and electrocorticography (ECoG). The results of these neural recording studies will be interpreted in the context of a flexible computational model based on the emerging tenet that the brain performs multisensory integration using optimal or Bayesian inference, combining the currently available sensory information with prior experience. In the first Aim, a Bayesian model will be constructed to explain individual differences in multisensory speech perception along three axes: subjects' ability to understand noisy audiovisual speech; subjects' susceptibility to the McGurk effect, a multisensory illusion; and the time spent fixating the mouth of a talking face. In the second Aim, we will explore the neural encoding of visual speech using voxel-wise forward encoding models of the BOLD fMRI signal. We will develop encoding models to test 7 different theories of visual speech representation from the linguistic and computer vision literature. In the third Aim, we will use ECoG to examine the neural computations for integrating visual and auditory speech, guided by the Bayesian models developed in Aim 1. First, we will study reduced neural variability for multisensory speech predicted by our model. Second, we will study the representational space of unisensory and multisensory speech.
Understanding speech is one of the most important functions of the human brain. We use information from both the auditory modality (the voice of the person we are talking to) and the visual modality (the facial movements of the person we are talking to) to understand speech. We will use computational models, eye tracking, and brain imaging and recording techniques to study the organization and operation of the brain during audiovisual speech perception.
Showing the most recent 10 out of 32 publications