Current approaches in statistical machine translation (MT) miss a key fact: the source language sentence is not the only way the author's meaning could have been expressed. The idea that the source sentence is just one of various ``packagings'' of underlying meaning was, of course, one familiar motivation for interlingual approaches to translation; however, interlingual semantic representations have generally been abandoned as notoriously difficult to define, and equally difficult to obtain accurately with broad coverage once defined. In this project, we are revisiting the idea of "packagings" of meaning, but exploring it in practical ways consistent with current practice in statistical MT. Unlike semantic transfer or interlingual approaches, we encode alternatives as source paraphrase lattices, a representation that allows us to exploit generalizations about the source language while still maintaining the surface-to-surface orientation that characterizes the statistical state of the art. Our exploratory work focuses on capturing syntactic and semantic variation using Lexicalized Well Founded Grammars (LWFG), a recent formalism that balances expressiveness with practical and provable learnability results. We are quantifying and characterizing the information available in source paraphrase lattices, assessing the value of shallow paraphrasing, and exploring the relative promise of deeper techniques for source paraphase generation using LWFG and other constraint-based grammatical frameworks. The ability to capture generalizations via source paraphrase may open new possibilities in the translation of minority and endangered languages, which lack training corpora on the scale necessary to support standard statistical MT techniques.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0838801
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2008-09-01
Budget End
2010-08-31
Support Year
Fiscal Year
2008
Total Cost
$160,763
Indirect Cost
Name
University of Maryland College Park
Department
Type
DUNS #
City
College Park
State
MD
Country
United States
Zip Code
20742