Statistical approaches to machine translation (MT) have been the subject of a great deal of recent research in natural language processing. A seminal piece of work in the field is that of researchers at IBM in the early 1990s; many of the original concepts within statistical MT were introduced within this paper, and much of the more recent work in statistical MT has built upon these ideas.

This project involves research that investigates alternative approaches to the IBM models. In particular, we are investigating the use of global feature-vector representations of translations, along with discriminative training methods. A first aim of the project is to develop a new framework for statistical machine translation. A second aim is to use the approach as a testbed for new features in machine translation systems. Our goal is to use the new approach to make a systematic study of issues such as word-sense disambiguation and the use of syntactic knowledge in statistical MT systems.

Improvements in the quality of machine translation systems could have a major impact on society. Recently, users of the world wide web or digital libraries (English speaking or otherwise) face a huge amount of text written in foreign languages. Human translation of any significant proportion of this text is not feasible. Machine translation systems could make a major impact on our ability to access natural language information in electronic form. In addition, this research will facilitate research on a wide range of features in MT systems, by a wide range of researchers.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0415030
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2005-01-15
Budget End
2008-12-31
Support Year
Fiscal Year
2004
Total Cost
$324,924
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139