Proposal Title: RI: Medium: Collaborative Research: Explicit Articulatory Models of Spoken Language, with Application to Automatic Speech Recognition Institution: Toyota Technological Institute at Chicago Abstract Date: 05/22/09 This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5). One of the main challenges in automatic speech recognition is variability in speaking style, including speaking rate changes and coarticulation. Models of the articulators (such as the lips and tongue) can succinctly represent much of this variability. Most previous work on articulatory models has focused on the relationship between acoustics and articulation, but more significant improvements require models of the hidden articulatory state structure. This work has both a technological goal of improving recognition and a scientific goal of better understanding articulatory phenomena. The project considers larger model classes than previously studied. In particular, the project develops graphical models, including dynamic Bayesian networks and conditional random fields, designed to take advantage of articulatory knowledge. A new framework for hybrid directed and undirected graphical models is being developed, in recognition of the benefits of both directed and undirected models, and of both generative and discriminative training. The project activities include major extension of earlier articulatory models with context modeling, asynchrony structures, and specialized training; development of factored conditional random field models of articulatory variables; and discriminative training to alleviate word confusability. The scientific goal addresses questions about the ways in which articulatory trajectories vary in different contexts. Existing databases are used, and initial work in manual articulatory annotation is being extended. In addition, the project uses articulatory models to perform forced transcription of larger data sets, providing an additional resource for the research community. Other broad impacts include new models and techniques with applicability to other time-series modeling problems. Extending the applicability of speech recognition will help it fulfill its promise of enabling more efficient storage of and access to spoken information, and equalizing the technological playing field for those with hearing or motor disabilities. NATIONAL SCIENCE FOUNDATION Proposal Abstract Proposal:0905633 PI Name:Livescu, Karen Printed from eJacket: 06/10/09 Page 1 of 1

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0905341
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2009-07-01
Budget End
2013-06-30
Support Year
Fiscal Year
2009
Total Cost
$378,000
Indirect Cost
Name
University of Washington
Department
Type
DUNS #
City
Seattle
State
WA
Country
United States
Zip Code
98195