Machine Translation (MT) into morphologically-rich languages poses unique challenges that have so far not been adequately addressed in state-of-the-art approaches. Even the best available MT systems into languages such as Arabic frequently produce translations that are disfluent and lack proper grammatical structure. This project explores novel approaches that address these issues by the development of a statistical MT framework that incorporates deeper levels of modeling of syntax and morphology. While the methods explored are largely language independent, the research is conducted and experimentally evaluated within the context of a large-scale English-to-Arabic MT system constructed using vast corpora available from LDC.

The research in this project focuses on novel approaches for combining syntactic and non-syntactic translation resources that are automatically acquired from vast amounts of parallel data and on exploring several alternative pathways for the integration of information provided by a high-accuracy morphological analysis and generation engine for Arabic into the MT framework. The project also explores methods for improving the syntax of MT output in Arabic using syntactic transfer rules that model syntactic divergences between English and Arabic. The goal is to develop an English-to-Arabic MT system that produces significantly more fluent, grammatical and accurate Arabic output than the current best systems, as measured by MT evaluation metrics (such as BLEU and METEOR), and as judged by human evaluators.

The availability of high-accuracy fully-automatic Machine Translation from English into Arabic has high potential value to the Arabic-speaking population at large, by opening up access to all English content available over the web. Such high-quality MT into Arabic may potentially also improve access to markets in the Arabic-speaking world for US and international companies.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0915327
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$450,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213