STIMULATE: Generalized Example-Based Machine Translation

Carbonell, Jaime

Abstract

Example-based machine translation (EMBT) searches a parallel corpus of pre-translated texts for the closest match to each new sentence being translated. Traditional EBMT works well only when there is a very large relevant parallel corpus (e.g. over 200 MB). The proposed investigation extends EBMT by generalizing words into semantic equivalence classes, by syntactic canonicalization of the source and target corpora, and by composing multiple partial matches, rather than selecting a single "best" match. These new methods will be evaluated in at least Spanish-English and Korean-English machine translation. Generalized EBMT promises to produce significantly higher accuracy translations than traditional EBMT, given the same size training corpus, or alternatively produce equivalent-quality translations given an order of magnitude smaller corpus. Combining the inherently brief development cycle of EBMT with the much smaller bilingual corpus requirement, makes generalized EBMT the future technology of choice for rapid deployment of machine translation to new, possibly exotic, language pairs.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 9618941
Program Officer: Ephraim P. Glinert

Project Start
Project End
Budget Start: 1997-03-01
Budget End: 2001-02-28
Support Year
Fiscal Year: 1996
Total Cost: $723,304
Indirect Cost

STIMULATE: Generalized Example-Based Machine Translation
Carbonell, Jaime
Carnegie-Mellon University, Pittsburgh, PA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments