This project will use new technologies for measuring brain activity to understand in detail how human listeners are able to separate competing, overlapping voices, and thereby to help design automatic systems capable of the same feat. Natural environments are full of overlapping sounds, and successful audio processing by both humans and machines relies on a fundamental ability to separate out sound sources of interest. This is commonly referred to as the "cocktail party effect," based on the ability of people to hear what a single person is saying despite the noisy background audio from other speakers. Despite the long history of research in hearing, this exceptional human capability for sound source separation is still poorly understood, and efforts to automatically separate overlapping voices by machine are correspondingly crude: although great advances have been made in robust processing of noisy speech by machine, separation of complex natural sounds (such as overlapping voices) remains a challenge. Advances in sensor technology now enable the modeling of this function in humans, giving an unprecedented, detailed view of sound representation processing in the brain. This project works specifically with measurements of neuroelectric response made directly on the surface of the human cortex (currently with a 256-electrode sensor array) for patients awaiting neurosurgery. Using such measurements made for controlled mixtures of voices, the project will endeavor to both develop models of voice separation in the human cortex by reconstructing an approximation to the acoustic stimulus from the neural population response, and in the process learning the linear mapping between the neural response back to a spectrogram measure of the stimulus. To attempt to significantly improve the ability of machine algorithms to mimic human source separation capability, the project will also focus on a signal processing framework that supports experiments with different combinations of cues and strategies to optimize agreement with the recordings of neural activity. The engineering model is based on the Computational Auditory Scene Analysis (CASA) framework, a family of approaches that have shown competitive results for handling sound mixtures.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1320366
Program Officer
Kenneth C. Whang
Project Start
Project End
Budget Start
2013-09-15
Budget End
2015-08-31
Support Year
Fiscal Year
2013
Total Cost
$180,006
Indirect Cost
Name
University of California San Francisco
Department
Type
DUNS #
City
San Francisco
State
CA
Country
United States
Zip Code
94103