A key aim in Natural Language Processing is to robustly map from natural language sentences to formal representations of their underlying meaning. Recent work has addressed this problem by learning semantic parsers given sentences paired with logical meaning representations. The goal of this project is to develop models and learning algorithms for recovering lexical structure, in the context mapping sentences to logical form. This work is inspired by linguistic theories of the lexicon, but directly motivated by the limitations observed in current, state-of-the-art learning algorithms.

The central hypothesis is that a new probabilistic learning approach for lexical generalization can simultaneous achieve the goals of (1) language-independent learning, (2) robustness when analyzing natural, unedited text, and (3) requiring reduced data annotation effort, in a computationally efficient manner that will scale to large learning problems. The approach under development induces a Combinatory Categorial Grammar (CCG), that is modified to replace the traditional, explicit list of lexical items in the lexicon with a distribution over lexical items that allows for significant generalization in the construction of possible syntactic and semantic structures for given input words. Modifying the CCG lexicon in this manner greatly increases the potential to generalize from the available training data without sacrificing the scalability that comes from working within an established grammar formalism for which efficient learning and parsing algorithms have been developed. This work will have impact at the algorithmic level and through applications, including advanced natural language interfaces to databases for non-technical users.

Project Report

A key aim in Natural Language Processing is to robustly map from natural language sentences to formal representations of their underlying meaning. Recent work has addressed this problem by learning semantic parsers given sentences paired with logical meaning representations. In this project, we developed new models and learning algorithms for recovering lexical structure, in the context of mapping sentences to logical form. The approach was inspired by linguistic theories of the lexicon, but directly motivated by the limitations observed in current, state-of-the-art learning algorithms. As proposed, we showed that a new probabilistic learning approach for lexical generalization can simultaneous achieve the goals of (1) language-independent learning, (2) robustness when analyzing natural, unedited text, and (3) requiring reduced data annotation effort, in a computationally efficient manner that will scale to large learning problems. The algorithm we designed induced a Combinatory Categorial Grammar (CCG), that was modified to replace the traditional, explicit list of lexical items in the lexicon with a distribution over lexical items that allows for significant generalization in the construction of possible syntactic and semantic structures for given input words. Modifying the CCG lexicon in this manner greatly increased the potential to generalize from the available training data without sacrificing the scalability that comes from working within an established grammar formalism for which efficient learning and parsing algorithms have been developed. This work had impact at the algorithmic level and through applications, including achieving state-of-the-art performance on a huber of benchmark datasets for evaluating advanced natural language interfaces to databases for non-technical users.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1115966
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2011-08-01
Budget End
2014-07-31
Support Year
Fiscal Year
2011
Total Cost
$300,000
Indirect Cost
Name
University of Washington
Department
Type
DUNS #
City
Seattle
State
WA
Country
United States
Zip Code
98195