Discriminatively trained conditional models have been applied with great success to language modeling problems for applications such as speech recognition and machine translation (MT), and have been demonstrated to consistently outperform generative modeling approaches. Unfortunately, in contrast to generative approaches, discriminative modeling is an exclusively supervised approach, requiring costly manually annotated training data. Yet there is an important difference between language modeling and other natural language processing (NLP) tasks that are modeled discriminatively. For other NLP tasks, sequences of words are the input, and some hidden structure is the output, e.g., parse trees; but for language modeling tasks, word sequences are the output given some input, such as a source language string being translated. Large text resources that are not paired with an input of interest nevertheless provide examples of well-formed outputs, which would be 'correct' for any inputs that producing it. The novel perspective in this exploratory proposal is recognizing this fundamental difference between typical NLP tasks, where there may be ample inputs but outputs must be manually annotated, from language modeling, where there are ample outputs with no corresponding input. This project explores methods for simulating inputs for observed word sequences, and for using these simulated inputs with a particular MT system to produce a set of alternative (confusable) word sequences to the original observed sequence. These sets of alternative sequences can then be used with conditional/discriminative estimation techniques, despite the fact that no supervision (manual translation of source strings) was required to produce the training data.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0741585
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2007-09-01
Budget End
2009-02-28
Support Year
Fiscal Year
2007
Total Cost
$105,800
Indirect Cost
Name
Oregon Health and Science University
Department
Type
DUNS #
City
Portland
State
OR
Country
United States
Zip Code
97239