A key aim in Natural Language Processing is to robustly map from natural language sentences to formal representations of their underlying meaning. Recent work has addressed this problem by learning semantic parsers given sentences paired with logical meaning representations. The goal of this project is to develop models and learning algorithms for recovering lexical structure, in the context mapping sentences to logical form. This work is inspired by linguistic theories of the lexicon, but directly motivated by the limitations observed in current, state-of-the-art learning algorithms.
The central hypothesis is that a new probabilistic learning approach for lexical generalization can simultaneous achieve the goals of (1) language-independent learning, (2) robustness when analyzing natural, unedited text, and (3) requiring reduced data annotation effort, in a computationally efficient manner that will scale to large learning problems. The approach under development induces a Combinatory Categorial Grammar (CCG), that is modified to replace the traditional, explicit list of lexical items in the lexicon with a distribution over lexical items that allows for significant generalization in the construction of possible syntactic and semantic structures for given input words. Modifying the CCG lexicon in this manner greatly increases the potential to generalize from the available training data without sacrificing the scalability that comes from working within an established grammar formalism for which efficient learning and parsing algorithms have been developed. This work will have impact at the algorithmic level and through applications, including advanced natural language interfaces to databases for non-technical users.
A key aim in Natural Language Processing is to robustly map from natural language sentences to formal representations of their underlying meaning. Recent work has addressed this problem by learning semantic parsers given sentences paired with logical meaning representations. In this project, we developed new models and learning algorithms for recovering lexical structure, in the context of mapping sentences to logical form. The approach was inspired by linguistic theories of the lexicon, but directly motivated by the limitations observed in current, state-of-the-art learning algorithms. As proposed, we showed that a new probabilistic learning approach for lexical generalization can simultaneous achieve the goals of (1) language-independent learning, (2) robustness when analyzing natural, unedited text, and (3) requiring reduced data annotation effort, in a computationally efficient manner that will scale to large learning problems. The algorithm we designed induced a Combinatory Categorial Grammar (CCG), that was modified to replace the traditional, explicit list of lexical items in the lexicon with a distribution over lexical items that allows for significant generalization in the construction of possible syntactic and semantic structures for given input words. Modifying the CCG lexicon in this manner greatly increased the potential to generalize from the available training data without sacrificing the scalability that comes from working within an established grammar formalism for which efficient learning and parsing algorithms have been developed. This work had impact at the algorithmic level and through applications, including achieving state-of-the-art performance on a huber of benchmark datasets for evaluating advanced natural language interfaces to databases for non-technical users.