Why are humans so much better than machines at recognizing speech? This research aims to measure differences between humans and machines in how they compute similarities between sounds. A computational model of speech perception will be trained on speech production data, using several types of features that are typically used in speech recognitions systems. It will then be tested on its ability to predict human listeners' responses in speech sound discrimination tasks. Results are expected to provide information about how the speech that listeners hear shapes their perception of sounds, as well as how well the information used by automatic speech recognition systems matches the information used by human listeners.

By allowing us to compare how humans and speech recognition systems use information when perceiving speech, this research will provide a tool that can help make speech recognition systems more human-like. Reverse engineering human perception can improve the way these systems generalize to new dialects, talkers, and noise conditions. This has the potential to facilitate the construction of systems for low-resource languages, broadening the impact of speech recognition technologies.

[Supported by SBE/BCS/PAC and CISE/IIS/RI]

Project Start
Project End
Budget Start
2013-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2013
Total Cost
$179,724
Indirect Cost
Name
University of Maryland College Park
Department
Type
DUNS #
City
College Park
State
MD
Country
United States
Zip Code
20742