Automatic evaluation of speech quality

Watson, Charles

Abstract

Tests of several different approaches to the automatic evaluation of the quality of speech segments are proposed. Previous systems for use in pronunciation training have typically employed either automatic speech-recognition (ASR) technology, or have used templates based on a limited number of utterances rated as excellent by L1 listeners (and sometimes also employing a second set of utterances containing a common pronunciation error). Here speech-processing technologies (HMM's and ANN's) will be developed specifically for use as evaluation systems (not recognition systems) to predict quality and locus-of-error judgments assigned by listeners. Termed the """"""""evaluation-of-single-words"""""""" (ESW) approach, the special feature of these systems will derive from the training tokens employed in their development: multiple recordings of a single word made by groups of native and non-native talkers. Sixty talkers will be native speakers of Arabic, whose intelligibility in English ranges from poor to near-perfect, and 60 talkers will be native speakers of middle-American English. There will be twelve words divided between one, two, and three syllables. Ten productions of each word will be recorded by each talker, yielding 14,400 tokens. Each token will be rated by listening juries for pronunciation quality, and the tokens will also be categorized into perceptual clusters, using MDS and cluster-analysis techniques. At least two computer-based evaluation systems (HMM and ANN) will be trained for each individual word, with the goals of predicting overall pronunciation quality and identifying specific commonly occurring pronunciation errors. It is expected that these word-specific systems, each representing a discrete """"""""evaluator"""""""" custom-built for an individual word, will approach the maximum accuracy that can be expected of this class of processors. If successful, the ESW approach may have a broad range of applications in pronunciation training, identification of a speaker's L1, foreign-language instruction, and other non-lexical applications. However, our specific goal is the development of systems that can provide informative feedback during automated pronunciation training. In ASR applications, the goal is to respond the same way to a word, no matter how it is pronounced. The goal of an ESW system is to respond differentially to pronunciation variants. This distinction between ASR and ESW is central to the development of successful evaluation systems as it dictates different modeling constraints.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute on Deafness and Other Communication Disorders (NIDCD)
Type: Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #: 1R43DC007255-01
Application #: 6882741
Study Section: Special Emphasis Panel (ZRG1-BBBP-B (10))
Program Officer: Shekim, Lana O

Project Start: 2004-09-24
Project End: 2005-09-30
Budget Start: 2004-09-24
Budget End: 2005-09-30
Support Year: 1
Fiscal Year: 2004
Total Cost: $99,973
Indirect Cost

Institution

Name: Communication Disorders Technology
Department
Type
DUNS #: 803046465

City: Bloomington
State: IN
Country: United States
Zip Code: 47408

Publications

Williams-Sanchez, Victoria; McArdle, Rachel A; Wilson, Richard H et al. (2014) Validation of a screening test of auditory function using the telephone. J Am Acad Audiol 25:937-51

Comments

Be the first to comment on Charles Watson's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Publications

Comments