RI: Small: RUI: Resource-light Morphosyntactic Tagging of Morphologically Complex Languages

Feldman, Anna

Abstract

This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).

The main goal of this project is to develop a tagging method which neither relies on target-language training data nor requires bilingual dictionaries and parallel corpora. The main assumption is that a model for the target language can be approximated by language models from one or more related source languages.

Exploiting cross-lingual correspondence leads to a better understanding of 1) what linguistic properties are crucial for morphosyntactic transfer; 2) how to measure language similarity at different levels: syntax, lexicon, morphology; 3) how this method applies to pairs that do not belong to the same family; 4) what determines the success of the model, and 5) how to quantify its potential for a given language pair. By exploiting cross-language relationships, the size, and hence cost, of the training data are significantly reduced.

This project is a new cross-fertilization between theoretical linguistics (especially typology and diachronic linguistics) and natural language processing. The practical contribution is a robust and portable system for tagging resource-poor languages. With this new approach, it is be possible to rapidly deploy tools to analyze a suddenly critical language. This approach can also enhance NSF's initiatives in documenting endangered low density languages as it leverages exactly the type of knowledge that a field linguist and a native speaker could provide. Additional benefits include high quality annotated data, automatically derived multilingual lexicons, annotation schemes for new languages, new typological generalizations, and graduate and undergraduate researchers with significant experience of highly practical work on difficult and underrepresented languages.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0916280
Program Officer: Tatiana D. Korelsky

Project Start
Project End
Budget Start: 2009-09-01
Budget End: 2013-08-31
Support Year
Fiscal Year: 2009
Total Cost: $169,174
Indirect Cost

RI: Small: RUI: Resource-light Morphosyntactic Tagging of Morphologically Complex Languages
Feldman, Anna
Montclair State University, Montclair, NJ, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments