Cocktail Party Problem: Perspective on Neurobiology of Auditory Scene Analysis

Elhilali, Mounya

Abstract

A Multi-scale Perspective on the Neurobiology of Auditory Scene Analysis A.
Aims and Significance Despite the enormous advances in computing technology over the last decades, there are stills many tasks that are easy for a child, yet difficult for advanced computer systems. A particular challenge to most existing systems is dealing with complex acoustic environments, background noises and competing talkers: A challenge often experienced in cocktail parties (Cherry, 1953) and formally referred to as auditory scene analysis (Bregman, 1990). Progress in this field has tremendous implications and long- term benefits covering the medical, industrial, military and robotics domains;as well as improving communication aids (hearing aids, cochlear implants, speech-based human-computer interfaces) for the sensory-impaired and aging brains. Despite its importance for both engineering and perceptual sciences, the study of the neural underpinnings of auditory scene analysis remains in its infancy. This field is particularly challenged by the lack of integrative theories which incorporate our knowledge of the perceptual bases of scene analysis with the neural mechanisms along various stages of the auditory pathway. Because of the nature of the problem, the neural circuitry at play is intricate and multi-scale by design. The objective of the proposed research is to provide a systems view to modeling scene analysis which integrates mechanisms at the single neuron level, population level and across area interactions. The intellectual merit of the proposed theory is to elucidate the specific mechanisms and computational rules at play;facilitate its integration in engineering systems and enable generating novel testable predictions. The proposal investigates the key hypothesis that attention to a feature of a complex sound instantiates all elements that are coherent with this feature, thus binding them together as one perceptual """"""""object"""""""" or stream. This """"""""binding hypothesis"""""""" requires three scales of analyses: a micro-level mapping of complex sounds into a multidimensional cortical feature representation;a meso-level coherence analysis correlating activity in populations of cortical neurons;and macro-level feedback processes of attention and expectations that mediate auditory object formation. We shall formulate this hypothesis within a multi-scale computational framework that provides a unified theory for the neural underpinnings of auditory scene analysis. The three core research aims of this project explore all facets of this model employing computational and physiological approaches:
Aim I. A multi-scale coherence model: The main goal is to formulate the """"""""binding hypothesis"""""""" as a unified biologically plausible theory of auditory streaming, integrating multi-scale sensory with cognitive cortical mechanisms. This computational effort will incorporate findings from experiments in Aims II and III, generate testable predictions, as well as provide effective algorithmic implementations to tackle the """"""""cocktail party problem"""""""" in biomedical applications;
Aim II. Physiological investigations of the multi-scale coherence theory:
Our aim i s to use an animal model to record single-unit (micro-level, meso-level) and across area (macro-level) physiological activity in both primary auditory and prefrontal cortex, while presenting sufficiently complex acoustic environments so as to test and refine the computational model;
Aim III. Refinement of the coherence theory with physiological and perceptual testing in humans: The objective is to directly test predictions from the model in human subjects, using magnetoencephalography (MEG) and psychoacoustic experiments. We shall particularly focus on the role of cortical mechanisms in scene analysis in normal and aging brains. The proposed research draws upon the expertise of a cross-disciplinary team integrating neurobiology and engineering. It is unique in that it is the first effort to postulate a role for coherence in the scene analysis problem, and to investigate the """"""""binding hypothesis"""""""" integrating cortical and attention mechanisms in auditory streaming experiments. In addition, by testing the theory directly on human subjects and comparing normal and aging brains (known to face perceptual difficulties in cocktail party settings), we hope to better understand the neural underpinnings of scene analysis under their normal and malfunctioning states, hence enhancing the translational potential of the model. The broader impact of this effort is to provide versatile and tractable models of auditory stream segregation, significantly facilitating the integration of such capabilities in engineering systems.

Public Health Relevance

Overcoming the Cocktail Party Problem: A Multi-scale Perspective on the Neurobiology of Auditory Scene Analysis Project Relevance: The question of how complex acoustic scenes are parsed by the auditory system into auditory objects and streams is one of the most fundamental questions in perceptual science. Despite its importance, the study of its underlying neural mechanisms remains in its infancy. We believe that significant progress in this area can be achieved by combining sophisticated computational modeling and psychophysical techniques with recently available methods for neural recording from awake behaving animals in interdisciplinary efforts, such as the one described in this proposal. In addition, by testing the theory directly on human subjects and comparing normal and aging brains (known to face perceptual difficulties in cocktail party settings), we hope to better understand the neural underpinnings of scene analysis under their normal and malfunctioning states, hence enhancing the translational potential of the model. The broader impact of this effort is to provide versatile and tractable models of auditory stream segregation, significantly facilitating the integration of such capabilities in engineering systems;as well as improving communication aids (hearing aids, cochlear implants, speech-based human-computer interfaces) for the sensory-impaired and aging brains.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute on Aging (NIA)
Type: Research Project (R01)
Project #: 5R01AG036424-04
Application #: 8477104
Study Section: Modeling and Analysis of Biological Systems Study Section (MABS)
Program Officer: Chen, Wen G

Project Start: 2010-06-01
Project End: 2015-05-31
Budget Start: 2013-06-01
Budget End: 2014-05-31
Support Year: 4
Fiscal Year: 2013
Total Cost: $398,608
Indirect Cost: $85,276

Institution

Name: Johns Hopkins University
Department: Engineering (All Types)
Type: Schools of Engineering
DUNS #: 001910777

City: Baltimore
State: MD
Country: United States
Zip Code: 21218

Related projects


NIH 2014 R01 AG	Cocktail Party Problem: Perspective on Neurobiology of Auditory Scene Analysis Elhilali, Mounya / Johns Hopkins University
NIH 2013 R01 AG	Cocktail Party Problem: Perspective on Neurobiology of Auditory Scene Analysis Elhilali, Mounya / Johns Hopkins University	$398,608
NIH 2012 R01 AG	Cocktail Party Problem: Perspective on Neurobiology of Auditory Scene Analysis Elhilali, Mounya / Johns Hopkins University	$433,424
NIH 2011 R01 AG	Cocktail Party Problem: Perspective on Neurobiology of Auditory Scene Analysis Elhilali, Mounya / Johns Hopkins University	$444,772
NIH 2010 R01 AG	Overcoming the Cocktail Party Problem: A Multi-scale Perspective on the Neurobio Elhilali, Mounya / Johns Hopkins University	$477,845

Publications

Akram, Sahar; Simon, Jonathan Z; Babadi, Behtash (2017) Dynamic Estimation of the Auditory Temporal Response Function From MEG in Competing-Speaker Environments. IEEE Trans Biomed Eng 64:1896-1905

Kaya, Emine Merve; Elhilali, Mounya (2017) Modelling auditory attention. Philos Trans R Soc Lond B Biol Sci 372:

Wolmetz, Michael; Elhilali, Mounya (2016) Attentional and Contextual Priors in Sound Perception. PLoS One 11:e0149635

Akram, Sahar; Presacco, Alessandro; Simon, Jonathan Z et al. (2016) Robust decoding of selective auditory attention from MEG in a competing-speaker environment via state-space modeling. Neuroimage 124:906-917

Akram, Sahar; Simon, Jonathan Z; Babadi, Behtash (2016) Dynamic Estimation of the Auditory Temporal Response Function from MEG in Competing-Speaker Environments. IEEE Trans Biomed Eng :

Carlin, Michael A; Elhilali, Mounya (2015) A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields. IEEE/ACM Trans Audio Speech Lang Process 23:2422-2433

Emmanouilidou, Dimitra; McCollum, Eric D; Park, Daniel E et al. (2015) Adaptive Noise Suppression of Pediatric Lung Auscultations With Real Applications to Noisy Clinical Settings in Developing Countries. IEEE Trans Biomed Eng 62:2279-88

Patil, Kailash; Elhilali, Mounya (2015) Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases. EURASIP J Audio Speech Music Process 2015:

Sell, Gregory; Suied, Clara; Elhilali, Mounya et al. (2015) Perceptual susceptibility to acoustic manipulations in speaker discrimination. J Acoust Soc Am 137:911-22

Akram, Sahar; Englitz, Bernhard; Elhilali, Mounya et al. (2014) Investigating the neural correlates of a streaming percept in an informational-masking paradigm. PLoS One 9:e114427

Showing the most recent 10 out of 21 publications

Comments

Be the first to comment on Mounya Elhilali's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: