Computer algorithms that analyze auditory scenes and extract individual sound sources would have a strong impact in several domains. For example, to facilitate natural interaction with computing devices by voice, an automatic speech recognition system must be able to focus on the voice of the person speaking to it and ignore sounds from all other sources. A hearing device must perform a similar task to allow a hearing impaired person conduct a conversation in a noisy, multiple source environment. Building on recent advances in the fields of machine learning and signal processing, we are developing sophisticated adaptive algorithms for analyzing auditory scenes with multiple sound sources. Our algorithms are based on probabilistic modeling of different sound sources and of the manner in which they overlap each other and distorted by reverberation and background noise. We use advanced recent techniques for inferring our models from sound data captured by a microphone array, separating those data into individual sources, and automatically determining the type of each source present and its location. Moreover, by reconstructing the clean signal of individual sound sources, we dramatically enhance the accuracy of automatic speech recognition for human speakers in multiple source environments. To facilitate the development and evaluation of our algorithms, and also to encourage competition between other research groups ultimately resulting in improved techniques, we collect a large dataset of multiple source auditory scenes, and make it publicly available on a dedicated website.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0535251
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2006-01-01
Budget End
2009-12-31
Support Year
Fiscal Year
2005
Total Cost
$375,000
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093