Statistical machine translation systems have seen increasing success in recent years, due to improved statistical methods and larger quantities of training data. However, their ability to generalize has been limited by overly simplistic representations, which are not sensitive to linguistic structure. This project investigates techniques for syntax-aware machine translation in three directions: (1) extending monolingual grammar induction models to the multilingual case, (2) designing discriminative tree-to-tree models which account for structural divergences between languages, and (3) building efficient inference systems for such models.

This project aims to develop general models and algorithms for syntactic translation, as well as building end-to-end translation systems. A particular focus is on language pairs for which monolingual resources do exist, but for which large bilingual training texts are unavailable. Another focus is exploring the way in which monolingual algorithms can be extended or applied to the multilingual case. In addition to the production of new translation systems and algorithms, a central goal is to develop and make available educational materials suitable for use in training undergraduate and graduate students in the areas of machine translation specifically and artificial intelligence more broadly.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0643742
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2006-12-15
Budget End
2012-11-30
Support Year
Fiscal Year
2006
Total Cost
$500,000
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704