Algorithms to separate audio sources have many potential uses such as to extract important audio data from historic recordings or to help people with hearing impairments select what to amplify and what to suppress in their hearing aids. Computer processing of audio content can potentially be used to isolate the sound sources of interest and to improve the audio clarity any time that the content exhibits interference from multiple sound sources, such as to extract a single voice of interest from a room full of voices. However, current sound source identification and separation methods are only reliable when there is a single predominant sound. This project will develop the science and technology that is needed to more easily isolate a single sound source from audio content with multiple competing sources, and that is needed to build interactive computer systems that will guide users though an interactive source separation process, to permit the separation and recombining of sound sources in a manner that is beyond the reach of existing audio software. The outcomes of the project will improve the possibility of speech recognition in environments with multiple talkers, will be useful for many scientific inquiries such as in biodiversity monitoring through the automated analysis of field recordings, and will be broadly useful any time that manual tagging of audio data is not practical.

While many computational auditory scene analysis algorithms have been proposed to separate audio scenes into individual sources, current methods are brittle and difficult to use and as a result have not been broadly adopted by potential users. The methods are brittle in that each algorithm relies on a single cue to separate sources and if the cue is not reliable then the method fails. The methods are difficult to use because the algorithms cannot predict which audio scenes any specific algorithm is likely to work on, and so the user does not know which method to apply in any given case. They are also difficult to use because their control parameters are hard to understand for users who lack expertise in signal processing. This project will research how to integrate multiple source separation algorithms into a single framework, and how to improve the ease of use by exploring interfaces that permit users to interactively define what they wish to isolate in audio scenes, and that permit systems to provide users with guidance on selecting a tool and setting the necessary parameters. The project will produce an open-source audio source separation tool that embodies these scientific research outcomes.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1420971
Program Officer
Ephraim Glinert
Project Start
Project End
Budget Start
2014-10-01
Budget End
2018-09-30
Support Year
Fiscal Year
2014
Total Cost
$514,261
Indirect Cost
Name
Northwestern University at Chicago
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60611