Our research involves the development of a syntactic approach for statistical machine translation that extends a tree adjoining grammar (TAG) formalism to the translation problem, and frames translation directly as a parsing problem. The model imposes no constraints on entries in the phrasal lexicon, thereby retaining the flexible lexical entries of phrase-based translation systems; it allows straightforward incorporation of a syntactic language model. The operations used to combine tree fragments into a complete parse tree are generalizations of standard parsing operations found in TAG; specifically, they are modified to be highly flexible, potentially allowing any possible permutation (reordering) of the initial fragments. This allows the model a great deal of freedom in capturing differences in word order between source and target languages.
The use of flexible parsing operations raises a couple of challenges that are a major focus of our research. First, efficient decoding algorithms are required for the models. Second, flexible parsing operations allow the model to capture complex reordering phenomena, but in addition introduce many spurious possibilities. We are investigating the use of learned, probabilistic constraints based on information in the source-language sentence, or in a parse tree for the source-language sentence, thereby incorporating syntactic information from the source language.
The end goal of the project is to develop new models for translation that improve the fluency or grammaticality of translations, improve the degree to which semantic information (e.g., predicate-argument structure) is preserved in translation, and improve the treatment of differing word orders between source and target languages.