This project investigates a novel approach for assessing the fluency and grammaticality of alternative translation hypotheses that are created within search-based Machine Translation (MT) systems. This task, commonly termed "Language Modeling" (LM), has been explored primarily in the context of speech recognition; however, current state-of-the-art language models (LMs) are not effective at distinguishing between more fluent grammatical translations and their poor alternatives. In contrast, the proposed approach, "Discriminative Knowledge-Rich Language Modeling" (DKRLM), is explicitly designed to find the most fluent and grammatical translations within the search space by comparing the linguistic features of the translation hypotheses against very large "clean" monolingual corpora. The intuition is that more grammatical translation hypotheses should contain higher proportions of features seen in the large corpora. An important contribution of the project is in exploring different types of linguistic features to identify those that are most informative for the comparisons. Moreover, discriminative training is performed to incorporate the features into a system-independent scoring function, replacing traditional LMs in MT systems. The broader impacts of the proposed work include both broader adoption for the methodology as well as wider use of the new DKRLM functions to other search-based NLP applications that aim at generating fluent grammatical text. This includes search-based approaches to Speech Recognition, Natural Language Generation (NLG), Optical Character Recognition (OCR), Summarization, and others.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0713402
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2007-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2007
Total Cost
$390,214
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213