The research community's understanding of the speech-to-text problem has reached a point at which most challenges can in principle be met, given a baseline system, enough data from the target domain, and an expert, who knows how to develop or adapt a recognizer for the target context-of-use. Unfortunately, this approach does not scale: despite the growing interest in speech-user interfaces, there are a limited number of experts equipped to analyze and develop an accurate speech recognizer.

This Early Grant for Exploratory Research explores the possibility of formalizing a speech recognition expert's implicit knowledge of the required analysis and development steps in a rule-based knowledge base, which can help a speech recognition non-expert develop a speech recognizer as part of an application, such as a dialog system in a rare dialect. Speech recognition experts adapt and improve recognizers by listening to data, aggregating error reports, and then adjusting parameters, retraining models, or applying adaptation techniques, based on their assessment of the mismatched context of use. This project extracts intuition from contextual interviews with such experts, develops a proof-of-concept expert system to predict the gains a system would see from specific adaptation techniques, and explores the factors which will make this approach feasible.

This project creates ways to make development of speech-enabled applications more accessible to a broader class of researchers, students, and practitioners, particularly from the user interface area. It will make joint development of user interface and speech recognition feasible, without requiring large teams with varied skill-sets.

Project Report

The research community's understanding of the speech-to-text problem has reached a point at which most challenges can in principle be met, given a baseline system, enough data from the target domain, and an expert, who knows how to develop or adapt a recognizer for the target context-of-use. Unfortunately, this approach does not scale: despite the growing interest in speech user interfaces, there are very few experts equipped to analyze a context and develop an accurate speech recognizer. In this project, our goal was to understand what speech experts "intuitively" do while building speech recognizers for a new user group, language, or an acoustic context, and formalize it, so that non-speech experts can benefit from it. The goal was not to render the speech expert superfluous, but to make it easier for non-speech experts, i.e. researchers or developers utilizing speech technology, to figure out why a speech system is failing, and guide their efforts in the right direction. More specifically, the goals were three fold: (i) to understand and formalize the tacit knowledge that speech experts employ while optimizing a recognizer for greater accuracy, (ii) support semi-automatic analysis of the errors occurring in speech recognition, and (iii) test the above rules and error analysis methodology through various experiments. In this work, we interviewed about 10 speech recognition experts and reviewed over 50 publications to formalize over 80 rules. We also developed a web-based, semi-automatic error analysis tool, which is available at http://speechkitchen.org/erroranalysis/. Finally, we tested the above rules and error analysis strategy on two datasets and have demonstrated that the recommendations from the above rule-based knowledge base can lead to development of recognizers with accuracy at least as much as that from the experts’ recommendations. This shows the fundamental feasibility of the proposed approach, and helped us identify and udnerstand challenges, such as coverage of toolkit-specific techniques, while other questions could be solved, for example how to resolve conflicts between rules. This work has paved way for a larger research goal: how to make the task of developing speech applications more accessible to a broader class of researchers, students, and practitioners. Insights from this work will be integrated into the "Speech Recognition Virtual Kitchen" (www.speechkitchen.org/), and will such be made available to the research community, in order to enable progress towards that goal.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1247368
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2012-08-01
Budget End
2014-01-31
Support Year
Fiscal Year
2012
Total Cost
$100,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213