Kevin Knight University of Southern California $143,499 - 12 mos.
Robust, Scalable Language Generation Using Symbolic and Statistical Techniques
This is a three-year continuing award. Natural language processing comprises both language analysis (text in) and language generation (text out). While many applications need both capabilities, the bulk of research and development has so far been on the former. As a result, weaknesses in many practical systems (e.g., translation, explanation, dialogue) are traceable to classic problems in natural language generation (NLG). New statistical techniques have made it possible to address classic problems through the extraction of knowledge automatically from online text corpora, but so far these techniques have been applied primarily to language analysis, and not to NLG. For example, word-sense disambiguation (word to concept) has been the object of intense recent study, while lexical selection (concept to word) has languished, relatively speaking. A similar discrepancy exists between sentence parsing (analysis) and sentence structuring (generation). While trainable parsers can now operate on unrestricted text, NLG usually requires perfect inputs and relies on handcrafted, domain-specific knowledge. We believe that statistical methods have the potential to improve NLG technology in the near term, to enable new applications, and to open up new research problems. Our research will emphasize accuracy, scalability, robustness, and evaluation; it will combine hand-built grammar, online lexical resources, and novel "learning by reading" approaches for gathering knowledge automatically from online texts.