Over the last decade, speech recognition technology has become steadily more present in everyday life, as seen by the proliferation of applications including mobile personal agents and transcription of voicemail messages. Performance of these systems, however, degrades significantly in the presence of background noise; for example, using speech recognition technology in a noisy restaurant or on a windy street can be difficult because speech recognizers confuse the background noise with linguistic content. Compensation for noise typically involves preprocessing the acoustic signal to emphasize the speech signal (i.e. speech separation), and then feeding this processed input into the recognizer. The innovative approach in this project is to train the recognition and separation systems in an integrated manner so that the linguistic content of the signal can inform the separation, and vice versa.

Given the impact of the recent resurgence of Deep Neural Networks (DNNs) in speech processing, this project seeks to make DNNs more resistant to noise by integrating speech separation and speech recognition, exploring three related areas. The first research area seeks to stabilize input to DNNs by combining DNN-based suppression and acoustic modeling, integrating masking estimates across time and frequency, and using this information to improve reconstruction of speech from noisy input. The second area seeks to examine a richer DNN structure, using multi-task learning techniques to guide the construction of DNNs better at performing all tasks and where layers have meaningful structure. The final research area examines ways to adapt the spurious output of DNN acoustic models given acoustic noise. With the focus of integrating speech separation and recognition, the project will be evaluated both by measuring speech recognition performance, as well as metrics that are more closely related to human speech perception. This will ensure a broader impact of this research by providing insights not only to speech technology but also facilitating the design of next-generation hearing technology in the long run.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1409431
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2014-06-01
Budget End
2017-05-31
Support Year
Fiscal Year
2014
Total Cost
$798,082
Indirect Cost
Name
Ohio State University
Department
Type
DUNS #
City
Columbus
State
OH
Country
United States
Zip Code
43210