Research in machine translation of human languages has made substantial progress recently, and surface patterns gleaned automatically from online bilingual texts work remarkably well for some language pairs. However, for many language pairs, the output of even the best systems is garbled, ungrammatical, and difficult to interpret. Chinese-to-English systems need particular improvement, despite the importance of this language pair, while English-to-Chinese translation, equally important for communication between individuals, is rarely studied. This project develops methods for automatically learning correspondences between Chinese and English at a semantic rather than surface level, allowing machine translation to benefit from recent work in semantic analysis of text and natural language generation. One part of this work determines what types of semantic analysis of source language sentences can best inform a translation system, focusing on analyzing dropped arguments, co-reference links, and discourse relations between clauses. These linguistic phenomena must generally be made more explicit when translating from Chinese to English. A second part of the work integrates natural language generation into statistical machine translation, leveraging generation technology to determine sentence boundaries, ordering of constituents, and production of function words that translation systems tend to get wrong. A third part develops and compares algorithms for training and decoding machine translation models defined on semantic representations. All of this research exploits newly-developed linguistic resources for semantic analysis of both Chinese and English.

The ultimate benefits of improved machine translation technology are easier access to information and easier communication between individuals. This in turn leads to increased opportunities for trade, as well as better understanding between cultures. This project's systems for both Chinese-to-English and English-to-Chinese are developed with the expectation that the approaches will be applied to other language pairs in the future.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0910992
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2009-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2009
Total Cost
$560,000
Indirect Cost
Name
University of Colorado at Boulder
Department
Type
DUNS #
City
Boulder
State
CO
Country
United States
Zip Code
80309