Natural language processing (NLP) is a key technology for the digital age. At the core of most NLP systems is a parser, a program which identifies the grammatical structure of sentences. Parsing is an essential prerequisite for language understanding. But despite significant progress in recent decades, accurate wide-coverage parsing for any genre or language remains an unsolved problem. The broader impact of this CAREER project will be to advance the state of art in NLP technology through the development of more accurate statistical parsing models.

Since language is highly ambiguous, parsers require a statistical model which assigns the highest probability to the correct structure of each sentence. The accuracy of current parsers is limited by the amount of available training data on which their models can be trained, and by the amount of information the models take into account. This CAREER project aims to advance parsing by developing novel methods of indirect supervision to overcome the lack of labeled training data, as well as new kinds of models which incorporate information about the prior linguistic context in which sentences appear. It employs Bayesian techniques, which give robust estimates and allow rich parametrization, and applies them to lexicalized grammars, which provide a compact representation of the syntactic properties of a language. This CAREER project will also train graduate students in natural language processing and develop materials that can be used to teach middle and high school students about NLP and to inspire them to pursue an education in computer science.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1053856
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2011-02-01
Budget End
2018-01-31
Support Year
Fiscal Year
2010
Total Cost
$500,001
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820