It has long been postulated that a human determines the linguistic identity of a sound based on detected evidences that exist at various levels of the speech knowledge hierarchy, from acoustics to pragmatics. Indeed, people do not continuously convert a speech signal into words as an automatic speech recognition (ASR) system attempts to do. Instead, they detect acoustic and auditory evidences, weigh them and combine them to form cognitive hypotheses, and then validate the hypotheses until consistent decisions are reached. The above human-based model of speech processing suggests a candidate framework for developing next generation speech technologies that have the potential to go beyond the current limitations.

In order to bridge the performance gap between ASR systems and humans, the narrow notion of speech-to-text in ASR has to be expanded to incorporate all related human information "hidden" in speech utterances. Instead of the conventional top-down, network decoding paradigm for ASR, we are establishing a bottom-up, event detection and evidence combination paradigm for speech research to facilitate collaborative Automatic Speech Attribute Transcription (ASAT). The goals of the proposed project are: (1) develop feature detection and knowledge integration modules to demonstrate ASAT and ASR; (2) build an open source, highly shared, plug-'n'-play ASAT cyberinfrastructure for collaborative research to lower entry barriers to ASR; and (3) provide an objective evaluation methodology to monitor technology advances in individual modules and across the entire system.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0427413
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2004-09-15
Budget End
2011-02-28
Support Year
Fiscal Year
2004
Total Cost
$1,758,900
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332