The long-term objective of the proposed work is to develop speech-production assessment and pronunciation- training tools for children with speech sound disorders. The technology resulting from research on computer- assisted pronunciation training has not yet been successfully extended to help children with speech sound disorders, primarily because of a lack of accuracy in phoneme-level analysis of the speech signal. The goal of the proposed exploratory research is to develop a set of algorithms that will constitute the core components of an effective pronunciation analysis system for children with speech sound disorders. The components of this system, when used in concert, will reliably identify and score the intelligibility of a phoneme within an isolated target word. The algorithms will also identify specific types of distortion errors (e.g. fronting, in which the /sh/ phoneme is realized as /s/). The tools resulting from the proposed work will provide immediate, relevant, and understandable feedback about pronunciation errors.
The Specific Aims are to (1) Create individualized speech templates for use in objective analysis of pronunciation, (2) Automatically identify phoneme locations in speech recordings, and (3) Automatically score phoneme intelligibility for children with speech sound disorders.
For Specific Aim 1, the template for evaluating a participant's spoken word will be selected from a large pool of templates of that word, and each template will be further individualized to match the general spectral characteristics of the participant.
For Specific Aim 2, the primary challenge is to identify phoneme locations when the observed (spoken) phoneme sequence is different from the expected (target) phoneme sequence. A five-step process will be used to identify possible differences between the observed and expected phoneme sequence using several independent sources of information. Methods will include automatic classification of manner of articulation using a Hidden Markov Model, dynamic time warping, and a priori determination of likely phoneme errors.
Specific Aim 3 will provide a measure of the intelligibility of a target phoneme and also identify distorted features. The scoring of intelligibility will be performed using a proposed Phoneme Intelligibility Analysis (PIA) module, which is phoneme-specific and composed of six sources of information, including an acoustic template of the target phoneme, likely phonetic substitutions, acoustic features used in analysis, thresholds of acceptability, statistics of phoneme duration in the given context, and evaluation metrics. The use of human perceptual data (intelligibility scores) as training data is an important and new component of the proposed approach.
The proposed work is relevant to the public health in that the software tools that result from this work will enable children with speech sound disorders to better communicate with the general population. Furthermore, these tools will assist teachers of such children in the task of pronunciation assessment, allowing the teachers to more effectively use their time.