Our research involves the development of a syntactic approach for statistical machine translation that extends a tree adjoining grammar (TAG) formalism to the translation problem, and frames translation directly as a parsing problem. The model imposes no constraints on entries in the phrasal lexicon, thereby retaining the flexible lexical entries of phrase-based translation systems; it allows straightforward incorporation of a syntactic language model. The operations used to combine tree fragments into a complete parse tree are generalizations of standard parsing operations found in TAG; specifically, they are modified to be highly flexible, potentially allowing any possible permutation (reordering) of the initial fragments. This allows the model a great deal of freedom in capturing differences in word order between source and target languages.

The use of flexible parsing operations raises a couple of challenges that are a major focus of our research. First, efficient decoding algorithms are required for the models. Second, flexible parsing operations allow the model to capture complex reordering phenomena, but in addition introduce many spurious possibilities. We are investigating the use of learned, probabilistic constraints based on information in the source-language sentence, or in a parse tree for the source-language sentence, thereby incorporating syntactic information from the source language.

The end goal of the project is to develop new models for translation that improve the fluency or grammaticality of translations, improve the degree to which semantic information (e.g., predicate-argument structure) is preserved in translation, and improve the treatment of differing word orders between source and target languages.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0915176
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2009-09-01
Budget End
2011-12-31
Support Year
Fiscal Year
2009
Total Cost
$450,000
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139