Natural language translation remains a crucial problem that is expensive, slow to develop solutions for, and difficult to scale. While automated approaches often result in understanding the gist, fully automated high quality translation remains far out of reach for the vast majority of the world's languages. A variety of projects are now emerging that tap into the Web-based community of people willing to help translate, but bilingual expertise is quite rare compared to the total availability of volunteers. This project will investigate whether a combination of machine translation and human participants that speak only a single language (i.e., monolingual speakers) can result in high quality translation. The research is organized around development of an iterative protocol that combines elements of machine translation, human and semi-automated language annotation, and human correction, motivated by concepts in information theory and discourse analysis. This research framework will support both synchronous and asynchronous pairwise interaction among human participants as well as a "bag of tasks" approach that permits truly distributed human computation.

With respect to broader impacts, this project is among the first to investigate the potential of hybrid human/machine translation involving non-bilingual human participants, combining practical implementation with empirically driven experimentation. If successful, this project will lower the bar for translation of natural languages, resulting in a widely available approach offering high quality translation for an unprecedentedly wide range of language pairs while reducing requirements and costs for bilingual expertise. The technology to be developed will be evaluated on a real-world problem: translation of books within the (previously NSF-funded) International Children's Digital Library project (www.childrenslibrary.org). The ICDL currently contains 4,000 books in 60 languages and has an active user population including 1,000 volunteers with differing language skills who are interested in helping with translation. Participants in Mexico, Romania, Mongolia, and the U.S. will act as early adopters in K-12 educational settings, supporting the ICDL's goal of enabling greater shared cultural understanding through this existing and growing resource.

Agency
National Science Foundation (NSF)
Institute
Division of Behavioral and Cognitive Sciences (BCS)
Type
Standard Grant (Standard)
Application #
0941455
Program Officer
Elizabeth Tran
Project Start
Project End
Budget Start
2009-10-01
Budget End
2013-09-30
Support Year
Fiscal Year
2009
Total Cost
$630,000
Indirect Cost
Name
University of Maryland College Park
Department
Type
DUNS #
City
College Park
State
MD
Country
United States
Zip Code
20742