Automated systems that can interact with human users in modalities such as speech and handwriting will greatly enhance productivity and system usability. These systems will allow simple access to information and services on the internet. These capabilities are essential to other tasks, such as enabling access by handicapped users or querying an on-line maintenance manual while performing intricate repairs. A statistical model of language is a crucial component in such systems, which convert between speech or handwriting and text, and in statistical machine translation systems. Most current algorithms for language modeling exhibit an acute myopia, basing their predictions of the next word on only a few immediately preceding words. When humans are faced with a comparable task they easily outperform these models using the richer linguistic information available to them from more complete context. Researchers at CLSP propose to investigate and develop novel language modeling techniques that exploit richer contextual information. They propose to examine models that use a variety of techniques to capture syntactic dependencies through dynamic, hierarchical models of topic, and to combine the resulting models with the best current ones using the maximum entropy principle. This research will focus on improving the recognition accuracy of spontaneous human speech, but beyond that will provide insight into new information sources and techniques applicable to all applications of language modeling.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
9618874
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
1997-03-01
Budget End
2001-02-28
Support Year
Fiscal Year
1996
Total Cost
$749,994
Indirect Cost
Name
Johns Hopkins University
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21218