To support an integrated global economy, it is essential that people of all backgrounds be able to function together effectively despite language barriers, and development of Computer Aided Language Learning (CALL) and accent modification tools is a key part of making this possible. In order to support effective learning and provide specific, useful pronunciation feedback to users, systems for pronunciation correction must be able to capture and accurately describe errors in articulation. Accurate acoustic-to-articulator inversion, the estimation of articulatory trajectories from an acoustic signal, has the potential to significantly improve the accuracy and specificity of such feedback to language learners, and enhance methods for in-depth study of both native speaker and second language learner articulatory patterns.
This research addresses the problem of robust speaker-independent acoustic-to-articulator inversion, which is a challenging problem due to the complexity of articulation patterns and significant inter-speaker differences. To overcome this difficulty, a novel speaker-independent inversion approach called Parallel Reference Speaker Weighting is being developed, which uses parallel acoustic-articulator adaptation to create speaker-specific models for new speakers without kinematic training data, represented in a normalized articulatory working space. The new approach is being evaluated on the Marquette University EMA-MAE Corpus of parallel acoustic / 3-D electromagnetic articulography data including both American English and Mandarin Accented English speakers.
The primary impact of this work focuses on the improvement of pronunciation assessment and accent modification systems, with potential for contribution to numerous other speech technologies, including speech recognition, speech coding, and audio and video synthesis.