Tests of several different approaches to the automatic evaluation of the quality of speech segments are proposed. Previous systems for use in pronunciation training have typically employed either automatic speech-recognition (ASR) technology, or have used templates based on a limited number of utterances rated as excellent by L1 listeners (and sometimes also employing a second set of utterances containing a common pronunciation error). Here speech-processing technologies (HMM's and ANN's) will be developed specifically for use as evaluation systems (not recognition systems) to predict quality and locus-of-error judgments assigned by listeners. Termed the """"""""evaluation-of-single-words"""""""" (ESW) approach, the special feature of these systems will derive from the training tokens employed in their development: multiple recordings of a single word made by groups of native and non-native talkers. Sixty talkers will be native speakers of Arabic, whose intelligibility in English ranges from poor to near-perfect, and 60 talkers will be native speakers of middle-American English. There will be twelve words divided between one, two, and three syllables. Ten productions of each word will be recorded by each talker, yielding 14,400 tokens. Each token will be rated by listening juries for pronunciation quality, and the tokens will also be categorized into perceptual clusters, using MDS and cluster-analysis techniques. At least two computer-based evaluation systems (HMM and ANN) will be trained for each individual word, with the goals of predicting overall pronunciation quality and identifying specific commonly occurring pronunciation errors. It is expected that these word-specific systems, each representing a discrete """"""""evaluator"""""""" custom-built for an individual word, will approach the maximum accuracy that can be expected of this class of processors. If successful, the ESW approach may have a broad range of applications in pronunciation training, identification of a speaker's L1, foreign-language instruction, and other non-lexical applications. However, our specific goal is the development of systems that can provide informative feedback during automated pronunciation training. In ASR applications, the goal is to respond the same way to a word, no matter how it is pronounced. The goal of an ESW system is to respond differentially to pronunciation variants. This distinction between ASR and ESW is central to the development of successful evaluation systems as it dictates different modeling constraints.

Agency
National Institute of Health (NIH)
Institute
National Institute on Deafness and Other Communication Disorders (NIDCD)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
1R43DC007255-01
Application #
6882741
Study Section
Special Emphasis Panel (ZRG1-BBBP-B (10))
Program Officer
Shekim, Lana O
Project Start
2004-09-24
Project End
2005-09-30
Budget Start
2004-09-24
Budget End
2005-09-30
Support Year
1
Fiscal Year
2004
Total Cost
$99,973
Indirect Cost
Name
Communication Disorders Technology
Department
Type
DUNS #
803046465
City
Bloomington
State
IN
Country
United States
Zip Code
47408
Williams-Sanchez, Victoria; McArdle, Rachel A; Wilson, Richard H et al. (2014) Validation of a screening test of auditory function using the telephone. J Am Acad Audiol 25:937-51