Small: RI: Broad-Coverage High-Accuracy Machine Translation into Morphologically-Rich Languages

Lavie, Alon

Abstract

Machine Translation (MT) into morphologically-rich languages poses unique challenges that have so far not been adequately addressed in state-of-the-art approaches. Even the best available MT systems into languages such as Arabic frequently produce translations that are disfluent and lack proper grammatical structure. This project explores novel approaches that address these issues by the development of a statistical MT framework that incorporates deeper levels of modeling of syntax and morphology. While the methods explored are largely language independent, the research is conducted and experimentally evaluated within the context of a large-scale English-to-Arabic MT system constructed using vast corpora available from LDC.

The research in this project focuses on novel approaches for combining syntactic and non-syntactic translation resources that are automatically acquired from vast amounts of parallel data and on exploring several alternative pathways for the integration of information provided by a high-accuracy morphological analysis and generation engine for Arabic into the MT framework. The project also explores methods for improving the syntax of MT output in Arabic using syntactic transfer rules that model syntactic divergences between English and Arabic. The goal is to develop an English-to-Arabic MT system that produces significantly more fluent, grammatical and accurate Arabic output than the current best systems, as measured by MT evaluation metrics (such as BLEU and METEOR), and as judged by human evaluators.

The availability of high-accuracy fully-automatic Machine Translation from English into Arabic has high potential value to the Arabic-speaking population at large, by opening up access to all English content available over the web. Such high-quality MT into Arabic may potentially also improve access to markets in the Arabic-speaking world for US and international companies.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0915327
Program Officer: Tatiana D. Korelsky

Project Start
Project End
Budget Start: 2009-09-01
Budget End: 2013-08-31
Support Year
Fiscal Year: 2009
Total Cost: $450,000
Indirect Cost

Small: RI: Broad-Coverage High-Accuracy Machine Translation into Morphologically-Rich Languages
Lavie, Alon
Carnegie-Mellon University, Pittsburgh, PA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments