This multi-site research effort is aimed at developing a coherent, consistent, standardized Interlingual representation along with a methodology and sharable tools for annotating large bilingual corpora of parallel texts. It has four central components: First, six corpora are being compiled, each consisting of a number of texts in a particular source language along with three translations of each text into English. Second, a standardized interlingual representation is being developed based on a comparative analysis of these parallel text corpora. Third, the bilingual corpora are being annotated using the standardized interlingua and following a predefined annotation procedure. Fourth, metrics are being developed for evaluating the accuracy and appropriateness of the interlingual representations in terms of the grain size of the representation given a particular task. The metrics are based on inter-coder reliability, the growth rate of the interlingual representation, and quality of the target language text that is be generated from the interlingua.

The resulting annotated, multilingual, parallel corpora will be useful as an empirical basis for developing a wide variety of interlingual NLP systems for tasks such as machine translation, question answering, web searching, summarization, or presentation generation, as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.

The participants include CRL at NMSU, ISI at USC, UMIACS at the University of Maryland, LTI at CMU, Columbia University, and The MITRE Corporation. The source languages include Arabic, Chinese, French, Hindi, Japanese, Spanish and English.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0325695
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
2003-09-01
Budget End
2005-08-31
Support Year
Fiscal Year
2003
Total Cost
$168,750
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213