The field of natural language processing (NLP) has, to date, largely focused its efforts on technology for English, even though it is a typological outlier and the majority of the world's people do not speak it. This project aims to develop statistical natural language analysis tools to disambiguate the morphological and syntactic structure of non-English text. Specifically, the objective of the pilot study is to design, train, implement, and disseminate statistical morpho-syntactic parsing models for Arabic and Hebrew. This project starts with a straightforward formalism (statistical head automaton grammars) and makes use of novel discriminative learning methods to build models that can be easily ported to new datasets. While previous work has simplified the problem by assuming perfect morphological disambiguation prior to parsing, for most languages, accurate morphological disambiguation is not yet available; this project aims to integrate morphological disambiguation into the parsing algorithm for better accuracy on both tasks. Impact: This project will improve global access to information by directly advancing core language processing technology in languages spoken by more than half a billion people and - because of the language-portability principle - by facilitating future work on many more languages. It is expected that this project will improve the state-of-the-art in parsing accuracy for the languages under consideration, and the models and algorithms developed will be made freely available for research purposes. These tools are expected to aid researchers working on applied technologies such as machine translation and multilingual information extraction.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0713265
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2007-09-01
Budget End
2009-08-31
Support Year
Fiscal Year
2007
Total Cost
$112,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213