Active Selection of Data for Machine Translation

Levin, Lori; Frederking, Robert

Abstract

Current methods for machine translation (MT) rely on large amounts of text data. However, large data is not available for many languages or for specialized vocabularies even in major languages. This project elicits bilingual data from a fairly naive human bilingual informant. Bilingual speakers are available for a language even when large data and trained linguists are not. A Corpus Navigator uses knowledge from language typology to choose the pieces of data that are most valuable for automatic learning of MT rules. The Corpus Navigator employs active learning in the sense that its state is updated by eliciting data from a human translator.

Two hypotheses are being tested: an MT system can get by with less data if it is the right data, and that the right data can be acquired through an active learning process guided by linguistic knowledge. Current government-run MT evaluations provide a testbed for these hypotheses. The outputs of MT systems trained on different data sets are compared in order to determine whether the hypotheses are correct. An initial prototype Corpus Navigator is being produced as a proof-of-concept.

This project will make it easier to build MT systems in situations where large text resources are not available. Languages that will be tested may include Inupiaq, Bengali, Thai, Urdu, Uzbek, and Tigrinia. The output of Corpus Navigation is a parallel, word-aligned corpus annotated with a semantic feature structure. This data will be available to other researchers.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0713292
Program Officer: Tatiana D. Korelsky

Project Start
Project End
Budget Start: 2007-09-15
Budget End: 2009-08-31
Support Year
Fiscal Year: 2007
Total Cost: $156,000
Indirect Cost

Active Selection of Data for Machine Translation
Levin, Lori Frederking, Robert
Carnegie-Mellon University, Pittsburgh, PA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments