*** Building better speech recognizers requires augmenting n-gram grammars with sophisticated yet probabilistic linguistic knowledge. This project is building probabilistic models of two important pieces of syntactic/semantic knowledge: verb-argument constraints and semantic text coherence.(1) Verbs place strong constraints on the syntax and semantics of their arguments. This project is computing probabilities for the different argument structures that can co-occur with different verbs, and using these probabilities to augment standard trigram language models.(2) Texts and discourses tend to be semantically coherent;in particular the words that occur in a text tend to be semantically related to each other. This project is applying a model of word meaning called Latent Semantic Analysis (LSA) to ASR LMs. In LSA, a word-similarity metric is defined by computing a large matrix of word co-occurrence probabilities, which are then smoothed via Singular Value Decomposition, resulting in a generalized measure of semantic word-similarity. Trigram models can then increase the probability that similar words will occur near each other. Building these two stochastic models of linguistic knowledge, besides possible application in speech recognition LMs, word-sense disambiguation, or parsing, also helps bridge the gap between the structural models used in linguistics and the statistical models of speech engineering.***

Project Start
Project End
Budget Start
1997-01-15
Budget End
1997-12-31
Support Year
Fiscal Year
1997
Total Cost
$50,000
Indirect Cost
Name
University of Colorado at Boulder
Department
Type
DUNS #
City
Boulder
State
CO
Country
United States
Zip Code
80309