Landmark-based Robust Speech Recognition using Prosody-guided Models of Speech Variability

Alwan, Abeer

Abstract

Date 04/11/2007 Despite great strides in the development of automatic speech recognition technology, we do not yet have a system with performance comparable to humans in automatically transcribing unrestricted conversational speech, representing many speakers and dialects, and embedded in adverse acoustic environments. This approach applies new high-dimensional machine learning techniques, constrained by empirical and theoretical studies of speech production and perception, to learn from data the information structures that human listeners extract from speech. To do this, we will develop large-vocabulary psychologically realistic models of speech acoustics, pronunciation variability, prosody, and syntax by deriving knowledge representations that reflect those proposed for human speech production and speech perception, using machine learning techniques to adjust the parameters of all knowledge representations simultaneously in order to minimize the structural risk of the recognizer. The team will develop nonlinear acoustic landmark detectors and pattern classifiers that integrate auditory-based signal processing and acoustic phonetic processing, are invariant to noise, change in speaker characteristics and reverberation, and can be learned in a semi-supervised fashion from labeled and unlabeled data. In addition, they will use variable frame rate analysis, which will allow for multi-resolution analysis, as well as implement lexical access based on gesture, using a variety of training data. The work will improve communication and collaboration between people and machines and also improve understanding of how human produce and perceive speech. The work brings together a team of experts in speech processing, acoustic phonetics, prosody, gestural phonology, statistical pattern matching, language modeling, and speech perception, with faculty across engineering, computer science and linguistics. Support and engagement of students and postdoctoral fellows are part of the project, engaging in speech modeling and algorithm development. Finally, the proposed work will result in a set of databases and tools that will be disseminated to serve the research and education community at large.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 0703805
Program Officer: Tatiana D. Korelsky

Project Start
Project End
Budget Start: 2007-06-01
Budget End: 2011-05-31
Support Year
Fiscal Year: 2007
Total Cost: $263,000
Indirect Cost

Landmark-based Robust Speech Recognition using Prosody-guided Models of Speech Variability
Alwan, Abeer
University of California Los Angeles, Los Angeles, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments