Units with a longer time basis than the traditional phones or sub- phones may provide a better structural basis for ASR systems customized for recognition of naturally spoken discourse. In this project, syllabic-length acoustic models are being explored for the recognition of conversational speech. The project entails: definition of the acoustic features derived from energy trajectories spanning ca. 250-ms intervals of speech; statistical modeling of these syllable-length regions; development of a decoding scheme designed to combine the outputs of these acoustic models with the more traditional phone- and sub-phone-length models; and embedding the syllable, phone and sub-phone features into a multi-tiered representation of language designed for robust recognition under a wide range of acoustic-environmental and speaking conditions. A complete ASR system is being developed that will incorporate the results of this research, and will be evaluated on fluent speech. Successful results with the recognition system have the potential to improve practical ASR systems that must deal with the decoding of spontaneous discourse. Additionally, the analysis of longer-time structure at the acoustic, statistical, and lexical levels should improve our basic knowledge about the structure of conversation speech.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9712579
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
1997-09-15
Budget End
2000-08-31
Support Year
Fiscal Year
1997
Total Cost
$784,518
Indirect Cost
Name
International Computer Science Institute
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704