ITR: Collaborative Research: Interlingual Annotation of Multilingual Text

Levin, Lori; Mitamura, Teruko

Abstract

This multi-site research effort is aimed at developing a coherent, consistent, standardized Interlingual representation along with a methodology and sharable tools for annotating large bilingual corpora of parallel texts. It has four central components: First, six corpora are being compiled, each consisting of a number of texts in a particular source language along with three translations of each text into English. Second, a standardized interlingual representation is being developed based on a comparative analysis of these parallel text corpora. Third, the bilingual corpora are being annotated using the standardized interlingua and following a predefined annotation procedure. Fourth, metrics are being developed for evaluating the accuracy and appropriateness of the interlingual representations in terms of the grain size of the representation given a particular task. The metrics are based on inter-coder reliability, the growth rate of the interlingual representation, and quality of the target language text that is be generated from the interlingua.

The resulting annotated, multilingual, parallel corpora will be useful as an empirical basis for developing a wide variety of interlingual NLP systems for tasks such as machine translation, question answering, web searching, summarization, or presentation generation, as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.

The participants include CRL at NMSU, ISI at USC, UMIACS at the University of Maryland, LTI at CMU, Columbia University, and The MITRE Corporation. The source languages include Arabic, Chinese, French, Hindi, Japanese, Spanish and English.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0325695
Program Officer: Ephraim P. Glinert

Project Start
Project End
Budget Start: 2003-09-01
Budget End: 2005-08-31
Support Year
Fiscal Year: 2003
Total Cost: $168,750
Indirect Cost

ITR: Collaborative Research: Interlingual Annotation of Multilingual Text
Levin, Lori Mitamura, Teruko
Carnegie-Mellon University, Pittsburgh, PA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments