Speech perception is one of the most important cognitive operations performed by the human brain and is fundamentally multisensory: when conversing with someone, we use both visual information from their face and auditory information from their voice. Multisensory speech perception is especially important when the auditory component of the speech is noisy, either due to a hearing disorder or normal aging. However, much less is known about the neural computations underlying visual speech perception than about those underlying auditory speech perception. To remedy this gap in existing knowledge, we will use converging evidence from two complementary measures of brain activity, BOLD fMRI and electrocorticography (ECoG). The results of these neural recording studies will be interpreted in the context of a flexible computational model based on the emerging tenet that the brain performs multisensory integration using optimal or Bayesian inference, combining the currently available sensory information with prior experience. In the first Aim, a Bayesian model will be constructed to explain individual differences in multisensory speech perception along three axes: subjects' ability to understand noisy audiovisual speech; subjects' susceptibility to the McGurk effect, a multisensory illusion; and the time spent fixating the mouth of a talking face. In the second Aim, we will explore the neural encoding of visual speech using voxel-wise forward encoding models of the BOLD fMRI signal. We will develop encoding models to test 7 different theories of visual speech representation from the linguistic and computer vision literature. In the third Aim, we will use ECoG to examine the neural computations for integrating visual and auditory speech, guided by the Bayesian models developed in Aim 1. First, we will study reduced neural variability for multisensory speech predicted by our model. Second, we will study the representational space of unisensory and multisensory speech.

Public Health Relevance

Understanding speech is one of the most important functions of the human brain. We use information from both the auditory modality (the voice of the person we are talking to) and the visual modality (the facial movements of the person we are talking to) to understand speech. We will use computational models, eye tracking, and brain imaging and recording techniques to study the organization and operation of the brain during audiovisual speech perception.

Agency
National Institute of Health (NIH)
Institute
National Institute of Neurological Disorders and Stroke (NINDS)
Type
Research Project (R01)
Project #
2R01NS065395-07A1
Application #
9055439
Study Section
Special Emphasis Panel (ZRG1-IFCN-Q (02))
Program Officer
Gnadt, James W
Project Start
2016-01-01
Project End
2021-12-31
Budget Start
2016-01-01
Budget End
2017-12-31
Support Year
7
Fiscal Year
2016
Total Cost
$346,719
Indirect Cost
$127,969
Name
Baylor College of Medicine
Department
Neurosurgery
Type
Schools of Medicine
DUNS #
051113330
City
Houston
State
TX
Country
United States
Zip Code
77030
Micheli, Cristiano; Schepers, Inga M; Ozker, Müge et al. (2018) Electrocorticography reveals continuous auditory and visual speech tracking in temporal and occipital cortex. Eur J Neurosci :
Magnotti, John F; Beauchamp, Michael S (2018) Published estimates of group differences in multisensory integration are inflated. PLoS One 13:e0202908
Ozker, Muge; Yoshor, Daniel; Beauchamp, Michael S (2018) Converging Evidence From Electrocorticography and BOLD fMRI for a Sharp Functional Boundary in Superior Temporal Gyrus Related to Multisensory Speech Processing. Front Hum Neurosci 12:141
Ozker, Muge; Yoshor, Daniel; Beauchamp, Michael S (2018) Frontal cortex selects representations of the talker's mouth to aid in speech perception. Elife 7:
Rennig, Johannes; Beauchamp, Michael S (2018) Free viewing of talking faces reveals mouth and eye preferring regions of the human superior temporal sulcus. Neuroimage 183:25-36
Zhu, Lin L; Beauchamp, Michael S (2017) Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus. J Neurosci 37:2697-2708
Ozker, Muge; Schepers, Inga M; Magnotti, John F et al. (2017) A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography. J Cogn Neurosci 29:1044-1060
Magnotti, John F; Beauchamp, Michael S (2017) A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech. PLoS Comput Biol 13:e1005229
Magnotti, John F; Mallick, Debshila Basu; Feng, Guo et al. (2016) Erratum to: Similar frequency of the McGurk effect in large samples of native Mandarin Chinese and American English speakers. Exp Brain Res 234:1333
Olds, Cristen; Pollonini, Luca; Abaya, Homer et al. (2016) Cortical Activation Patterns Correlate with Speech Understanding After Cochlear Implantation. Ear Hear 37:e160-72

Showing the most recent 10 out of 32 publications