The majority of current computer speech recognition systems model the speech signal with homogeneous observation frames, represent words by a string of phonemes, and rely heavily on statistical word-based language models to decode the underlying word sequence. This project aims to investigate an alternative approach that incorporates many more levels of linguistic information into a parsimonious hierarchical framework for speech recognition and understanding. This approach will provide new perspectives on incorporating constraints from the distinctive feature, phonetic, phonological, syllabic, morphological, lexical, syntactic, and semantic levels into a probabilistic framework for speech recognition and understanding. Structure sharing of sub-word levels across words will allow for the generalization of phonological effects across similar environments and increased flexibility for dynamic vocabularies and language models. Structure sharing should also produce a more efficient search with a smaller number of parameters. The proposed hierarchical framework also has the potential of serving as a recognition kernel, with the speech signal as input and a set of morpho-phonological units as output. This kernel would have a finite inventory of units for a given language, whose internals will be vocabulary and task independent. To ensure that the proposed framework is language independent, its utility will also be investigated for languages other than English.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9618731
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
1997-03-01
Budget End
2001-02-28
Support Year
Fiscal Year
1996
Total Cost
$710,543
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139