Understanding speech is one of the most important functions of the human brain. We use information from both the auditory modality (the voice of the person we are talking to) and the visual modality (the facial movements of the person we are talking to) to understand speech. This is advantageous because using the independent information available from the auditory and visual modalities is more accurate than either modality in isolation; an evolving tenet of multisensory integration (of which audiovisual speech perception is one of the most important examples) is that multisensory integration is Bayes-optimal-that is, the reliability of different sensory modalities are taken into account when integrating them. A key obstacle to progress is our lack of knowledge about whether audiovisual speech perception is also Bayes-optimal. For instance, If we hear a talker say 'ba' (auditory modality) but see them say 'ga' (visual modality), we often perceive a completely different syllable, 'da'. This illusion, known as the McGurk effect, provides a useful demonstration of our ignorance of multisensory speech perception. We will construct a computational model of how the brain should combine auditory and visual speech information, and fit it to behavioral data collected as subjects listen to audiovisual speech. Then, we will test the model's predictions using the two most powerful methods for examining human brain function: blood oxygen-level dependent functional magnetic resonance imaging (BOLD fMRI) and direct neural recording using implanted electrodes (electrocorticography or ECoG). Some subjects perceive the McGurk effect and others do not. Our model accounts for this behavioral variability by positing that subjects' encoding of audiovisual speech is corrupted by sensory noise; across-subject differences in variability are modeled as different levels of sensory noise. We will measure neuronal variability to audiovisual syllables using BOLD fMRI, a method well-suited to testing large numbers of subjects. We expect that subjects with greater neuronal response variability (more sensory noise) will show greater behavioral variability in speech perception. In order to predict how subjects will perceive speech, our model estimates subjects' internal representations of different audiovisual syllables. Our hypothesis is that this model estimate willcorrespond to the neural representation of audiovisual speech. We will test this hypothesis by constructing a neural dissimilarity index (using the high gamma band response as an index of neuronal firing) and comparing it with the model's estimate of behavioral dissimilarity. Our preliminary results suggest that there are both early (stimulus-driven) and late (cognitive-related) neural responses to audiovisual speech with different representational properties; only ECoG has the necessary temporal resolution to distinguish these components. Our model estimates the weights applied to individual sensory modalities during multisensory integration but do not specify the source of these weights. Eye movements are a possible moderator of individual differences in these weights. Our hypothesis, supported by preliminary data, is that subjects who weight the visual modality strongly (and do perceive the McGurk effect) preferentially fixate the mouth of the talker in an audiovisual speech stimulus, while subjects who weight the visual modality weakly (and do not perceive the McGurk effect) fixate the eyes of the talker.

Public Health Relevance

Understanding speech is one of the most important functions of the human brain. We use information from both the auditory modality (the voice the of person we are talking to) and the visual modality (the facial movements of the person we are talking to) to understand speech. We will use computational models; eye tracking; and brain imaging and recording techniques to study the organization and operation of the brain during audiovisual speech perception.

Agency
National Institute of Health (NIH)
Institute
National Institute of Neurological Disorders and Stroke (NINDS)
Type
Research Project (R01)
Project #
7R01NS065395-06
Application #
9079790
Study Section
Cognitive Neuroscience Study Section (COG)
Program Officer
Gnadt, James W
Project Start
2015-06-16
Project End
2016-01-31
Budget Start
2015-06-16
Budget End
2016-01-31
Support Year
6
Fiscal Year
2014
Total Cost
$285,272
Indirect Cost
$105,290
Name
Baylor College of Medicine
Department
Neurosurgery
Type
Schools of Medicine
DUNS #
051113330
City
Houston
State
TX
Country
United States
Zip Code
77030
Micheli, Cristiano; Schepers, Inga M; Ozker, Müge et al. (2018) Electrocorticography reveals continuous auditory and visual speech tracking in temporal and occipital cortex. Eur J Neurosci :
Magnotti, John F; Beauchamp, Michael S (2018) Published estimates of group differences in multisensory integration are inflated. PLoS One 13:e0202908
Ozker, Muge; Yoshor, Daniel; Beauchamp, Michael S (2018) Converging Evidence From Electrocorticography and BOLD fMRI for a Sharp Functional Boundary in Superior Temporal Gyrus Related to Multisensory Speech Processing. Front Hum Neurosci 12:141
Ozker, Muge; Yoshor, Daniel; Beauchamp, Michael S (2018) Frontal cortex selects representations of the talker's mouth to aid in speech perception. Elife 7:
Rennig, Johannes; Beauchamp, Michael S (2018) Free viewing of talking faces reveals mouth and eye preferring regions of the human superior temporal sulcus. Neuroimage 183:25-36
Zhu, Lin L; Beauchamp, Michael S (2017) Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus. J Neurosci 37:2697-2708
Ozker, Muge; Schepers, Inga M; Magnotti, John F et al. (2017) A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography. J Cogn Neurosci 29:1044-1060
Magnotti, John F; Beauchamp, Michael S (2017) A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech. PLoS Comput Biol 13:e1005229
Magnotti, John F; Mallick, Debshila Basu; Feng, Guo et al. (2016) Erratum to: Similar frequency of the McGurk effect in large samples of native Mandarin Chinese and American English speakers. Exp Brain Res 234:1333
Olds, Cristen; Pollonini, Luca; Abaya, Homer et al. (2016) Cortical Activation Patterns Correlate with Speech Understanding After Cochlear Implantation. Ear Hear 37:e160-72

Showing the most recent 10 out of 32 publications