STATGEN: Robust, Scalable Language Generation Using Symbolic and Statistical Techniques

Knight, Kevin

Abstract

Kevin Knight University of Southern California $143,499 - 12 mos.

Robust, Scalable Language Generation Using Symbolic and Statistical Techniques

This is a three-year continuing award. Natural language processing comprises both language analysis (text in) and language generation (text out). While many applications need both capabilities, the bulk of research and development has so far been on the former. As a result, weaknesses in many practical systems (e.g., translation, explanation, dialogue) are traceable to classic problems in natural language generation (NLG). New statistical techniques have made it possible to address classic problems through the extraction of knowledge automatically from online text corpora, but so far these techniques have been applied primarily to language analysis, and not to NLG. For example, word-sense disambiguation (word to concept) has been the object of intense recent study, while lexical selection (concept to word) has languished, relatively speaking. A similar discrepancy exists between sentence parsing (analysis) and sentence structuring (generation). While trainable parsers can now operate on unrestricted text, NLG usually requires perfect inputs and relies on handcrafted, domain-specific knowledge. We believe that statistical methods have the potential to improve NLG technology in the near term, to enable new applications, and to open up new research problems. Our research will emphasize accuracy, scalability, robustness, and evaluation; it will combine hand-built grammar, online lexical resources, and novel "learning by reading" approaches for gathering knowledge automatically from online texts.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 9820291
Program Officer: Ephraim P. Glinert

Project Start
Project End
Budget Start: 1999-05-15
Budget End: 2002-04-30
Support Year
Fiscal Year: 1998
Total Cost: $428,859
Indirect Cost

STATGEN: Robust, Scalable Language Generation Using Symbolic and Statistical Techniques
Knight, Kevin
University of Southern California, Los Angeles, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments