RI: Parsing Models and Algorithms for Morphologically Rich Languages

Smith, Noah

Abstract

The field of natural language processing (NLP) has, to date, largely focused its efforts on technology for English, even though it is a typological outlier and the majority of the world's people do not speak it. This project aims to develop statistical natural language analysis tools to disambiguate the morphological and syntactic structure of non-English text. Specifically, the objective of the pilot study is to design, train, implement, and disseminate statistical morpho-syntactic parsing models for Arabic and Hebrew. This project starts with a straightforward formalism (statistical head automaton grammars) and makes use of novel discriminative learning methods to build models that can be easily ported to new datasets. While previous work has simplified the problem by assuming perfect morphological disambiguation prior to parsing, for most languages, accurate morphological disambiguation is not yet available; this project aims to integrate morphological disambiguation into the parsing algorithm for better accuracy on both tasks. Impact: This project will improve global access to information by directly advancing core language processing technology in languages spoken by more than half a billion people and - because of the language-portability principle - by facilitating future work on many more languages. It is expected that this project will improve the state-of-the-art in parsing accuracy for the languages under consideration, and the models and algorithms developed will be made freely available for research purposes. These tools are expected to aid researchers working on applied technologies such as machine translation and multilingual information extraction.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0713265
Program Officer: Tatiana D. Korelsky

Project Start
Project End
Budget Start: 2007-09-01
Budget End: 2009-08-31
Support Year
Fiscal Year: 2007
Total Cost: $112,000
Indirect Cost

RI: Parsing Models and Algorithms for Morphologically Rich Languages
Smith, Noah
Carnegie-Mellon University, Pittsburgh, PA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments