RI: Small: Statistical Machine Translation Through a Tree Adjoining Grammar with Flexible Parsing Operations

Collins, Michael

Abstract

Our research involves the development of a syntactic approach for statistical machine translation that extends a tree adjoining grammar (TAG) formalism to the translation problem, and frames translation directly as a parsing problem. The model imposes no constraints on entries in the phrasal lexicon, thereby retaining the flexible lexical entries of phrase-based translation systems; it allows straightforward incorporation of a syntactic language model. The operations used to combine tree fragments into a complete parse tree are generalizations of standard parsing operations found in TAG; specifically, they are modified to be highly flexible, potentially allowing any possible permutation (reordering) of the initial fragments. This allows the model a great deal of freedom in capturing differences in word order between source and target languages.

The use of flexible parsing operations raises a couple of challenges that are a major focus of our research. First, efficient decoding algorithms are required for the models. Second, flexible parsing operations allow the model to capture complex reordering phenomena, but in addition introduce many spurious possibilities. We are investigating the use of learned, probabilistic constraints based on information in the source-language sentence, or in a parse tree for the source-language sentence, thereby incorporating syntactic information from the source language.

The end goal of the project is to develop new models for translation that improve the fluency or grammaticality of translations, improve the degree to which semantic information (e.g., predicate-argument structure) is preserved in translation, and improve the treatment of differing word orders between source and target languages.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0915176
Program Officer: Tatiana D. Korelsky

Project Start
Project End
Budget Start: 2009-09-01
Budget End: 2011-12-31
Support Year
Fiscal Year: 2009
Total Cost: $450,000
Indirect Cost

RI: Small: Statistical Machine Translation Through a Tree Adjoining Grammar with Flexible Parsing Operations
Collins, Michael
Massachusetts Institute of Technology, Cambridge, MA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments