? (Title: Dynamic Neural Mechanisms of Audiovisual Speech Perception) Natural speech perception is multisensory; when conversing with someone that we can see, our brains combine visual (V) information from face, postural and hand gestures with auditory (A) information from the voice. The underlying speech processing is extremely rapid, with incoming AV units (e.g., syllables) arriving every few hundred milliseconds that must be encoded and passed on before the next syllable arrives. Finally, this bottom-up sensory information is combined with a top-down cognitive component: what we perceive is strongly influenced by its context. Speech is fundamentally human, and thus, its brain mechanisms are usually studied with noninvasive fMRI, EEG and MEG. Because each method has critical limitations in spatial or temporal resolution, identifying the specific brain mechanisms of speech perception - AV integration, precise and rapid information encoding and top-down control - is a nearly intractable problem. This three-year U01 project will sidestep the problem using direct recording of neuron ensemble (electrocorticographic or ECoG) activity and single neuron activity, along with direct stimulation of selected sites in the brains of surgical epilepsy patients as they process AV speech. Our collaborative ECoG team embodies expertise in multisensory integration and speech perception and leverages the skills and perspectives of neuroscientists, neurosurgeons, engineers, neuropsychologists, neurologists, and ethicists across three leading epilepsy centers: Columbia University Medical Center, Baylor College of Medicine and Northshore-Long Island Jewish Medical Center. By combining the expertise and patients available at all three centers, we will be able to tackle problems that are inaccessible to individual investigators. Our overarching hypothesis, building on our past work and supported by preliminary data, is that fluctuations in the excitability of neurons?oscillations?play a key role in speech processing.
Aim 1 tests the general hypothesis that delta/theta range (2-8 Hz) neuronal oscillations play a key role in the integration of auditory and visual speech information.
Aim 2 tests the general hypothesis that high-frequency activity (50 Hz and above) encodes representations of auditory and visual speech information, reflecting both bottom-up and top-down influences on perception. The concept employed in this proposal of oscillatory dynamics as mechanistic instruments (Aim 1) that organize the encoding of information in neuronal firing patterns under dynamic top-down control (Aim 2) are part of a paradigm shift in speech science. The broad goal of this proposal is to contribute key foundations for this new paradigm, and set the stage for a comprehensive understanding of the brain circuits and physiological processes underlying natural speech perception, including complex social settings.

Public Health Relevance

Understanding speech is an everyday phenomenon that we usually take for granted. The processes underlying our brain's ability to perform this function, however, are complex. Overlapping networks have to combine auditory information (what we actually hear) and visual information (mouth movements) into a coherent perception. This processing has to occur on a rapid timescale, on the order of hundreds of milliseconds, which is the tempo of arrival of individual units of speech (syllables). In this proposal, we study how fluctuations in neuronal activity ? oscillations ? play a key role in processing speech information. We theorize that these oscillations allow networks to communicate with one another, integrate information, fill in missing information, and arrive at a coherent understanding of spoken speech. Understanding how this process occurs will directly improve our ability to treat neurological disorders involving dysfunctional speech and language processing, such as learning disabilities, dyslexia, stroke, autism spectrum disorders, schizophrenia, and many others.

National Institute of Health (NIH)
National Institute of Neurological Disorders and Stroke (NINDS)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZNS1)
Program Officer
Gnadt, James W
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Schools of Medicine
New York
United States
Zip Code
Zhang, Honghui; Watrous, Andrew J; Patel, Ansh et al. (2018) Theta and Alpha Oscillations Are Traveling Waves in the Human Neocortex. Neuron 98:1269-1281.e4
Micheli, Cristiano; Schepers, Inga M; Ozker, Müge et al. (2018) Electrocorticography reveals continuous auditory and visual speech tracking in temporal and occipital cortex. Eur J Neurosci :
Ozker, Muge; Yoshor, Daniel; Beauchamp, Michael S (2018) Converging Evidence From Electrocorticography and BOLD fMRI for a Sharp Functional Boundary in Superior Temporal Gyrus Related to Multisensory Speech Processing. Front Hum Neurosci 12:141
Lázaro-Muñoz, Gabriel; Yoshor, Daniel; Beauchamp, Michael S et al. (2018) Continued access to investigational brain implants. Nat Rev Neurosci 19:317-318
Ozker, Muge; Schepers, Inga M; Magnotti, John F et al. (2017) A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography. J Cogn Neurosci 29:1044-1060