Speech is central to human life. Yet how the human brain processes speech in complex everyday situations remains poorly understood. One prominent idea is that speech perception is carried out using brain areas and mechanisms that are used for processing sounds more generally. And it has been suggested that these mechanisms become specialized for speech through learning, resulting in a speech processing network in the brain that processes increasingly complex aspects of the speech signal at successive hierarchical stages. But questions about the function of this hierarchy remain. In particular, while it is commonly acknowledged that seeing a speaker?s face in noisy environments can improve comprehension, our understanding of how visual speech influences the hierarchical processing of speech remain unclear. This is unfortunate as speech processing, and multisensory speech processing in particular, have been reported to be affected in a number of clinical disorders, including autism and schizophrenia. Thus, as well as contributing to our understanding of this most fundamental of human abilities, better knowledge of the neural mechanisms underpinning audiovisual speech processing could have important clinical research implications. One of the principal reasons for our lack of knowledge on the neurophysiology of audiovisual speech is the technical challenge associated with indexing the neural processing of natural speech with high temporal resolution and at multiple levels of the speech processing hierarchy. Non-human primates represent a less than perfect model for studying human speech processing, the hemodynamic changes underlying functional magnetic resonance imaging are too slow to track natural speech dynamics, and electrocorticography samples only a limited number of brain areas and cannot be broadly applied in clinical research. Recently, our group has introduced several new approaches for indexing natural speech processing using electroencephalography (EEG). These include entirely novel frameworks for producing dependent measures of the hierachical encoding of natural speech, and for quantifying multisensory integration of natural audiovisual speech. The present proposal seeks to exploit this opportunity to test the hypothesis that the integration of audio and visual speech is a flexible, multistage process that adapts to optimize comprehension based on the current listening conditions. Across three objectives the proposal aims to characterize this flexibility by determining how the hierarchical processing stage at which visual and audio speech are integrated varies as a function of 1) the listening environment, 2) the visual information available and 3) the deployment of attention. The work promises to bring a new depth of understanding to the perception of one of humanity?s most essential signals. And it will introduce several novel analyses and experimental paradigms that should be easily deployable in tackling research on clinical cohorts in which speech processing and/or multisensory integration is impaired.
While it is well known that seeing a speaker?s face in a noisy environment can help you to understand what they are saying, our understanding of how the brain actually combines audio and visual speech information to achieve this is not well understood. A better understanding of the neural mechanisms involved in this process would be of great benefit as the, so called, multisensory integration of audio and visual speech has been reported to be specifically affected in developmental and psychiatric disorders, including autism and schizophrenia. This projects seeks to exploit several brand new approaches for studying natural audiovisual speech integration in the healthy human brain so as to gain greater insights into how brains so effortlessly combine speech information from vision and sound, with a view to informing future clinical research in several patient populations.