Recent years have witnessed significant improvements in automatic machine translation systems. An enabling factor for this short turnaround time is the adoption of automatic evaluation metrics, which serve to guide developers to improve translation systems. However, current metrics are a coarse approximation of human judgments for system outputs. This SGER project is exploring ways of improving current metrics. It does so on three fronts. First, judgments of syntactic similarities between system outputs and human references are being incorporated into the metric in addition to phrasal similarities. Second, machine learning techniques are being applied to find characteristics that are correlated to human judgments of quality. Third, the project is determining more fine-grained analyses of what is wrong as well as what is right with system outputs based on syntactic as well as lexical characteristics of the system outputs. In developing a learnable, syntax-aware metric, discriminative methods will be generalized for complex, structured problems. A more informative automatic evaluation metric will lead to more rapid improvements in machine translation technology, enabling access to archives from centuries past and documents from around the world. All relevant software and non-proprietary data will be distributed widely. Written reports of experimental results and findings will be disseminated through academic publications.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0612791
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2006-06-01
Budget End
2008-05-31
Support Year
Fiscal Year
2006
Total Cost
$205,199
Indirect Cost
Name
University of Pittsburgh
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213