RI: Small: A Bayesian Approach to Dynamic Lexical Resources for Flexible Language Processing

Palmer, Martha

Abstract

This project uses statistical models and human judgment to determine dynamic, probabilistic representations of extensible usages of words; these representations are suitable for incorporation into VerbNet, a lexical resource widely used in the Natural Language Processing (NLP) community. Existing lexical resources reflect a binary notion of usages as grammatical or not. However, in actual language use, forms vary in acceptability; moreover, the process of coercion extends words beyond their standard usages. For example, a strictly intransitive action verb such as 'sneeze' may be used as in 'She sneezed the foam off the cappuccino', expressing manner of motion. This research has a two-pronged approach involving extensive use of machine learning and a fundamental shift in the development and use of VerbNet. Specifically, the research develops probabilistic methods for: (1) analyzing usages of verbs in large corpora and incorporating the resulting probabilistic information into VerbNet classes; and (2) representing information about the likelihood of potential constructional coercions and the productivity of such extensions. These developments use the Hierarchical Bayesian Model of Parisien and Stevenson, which are an ideal framework for marrying probabilistic reasoning about complex, real-world data within the hierarchically-organized VerbNet lexicon. In addition to statistical models, the representations are also informed by human judgments with respect to the use of such constructions. Thus, this research enriches the current symbolic verb representations in VerbNet with probabilistic distributional information, which becomes salient through the influence of construction grammar.

Encoding verb knowledge probabilistically provides the necessary flexibility to represent extensional constructions and support their appropriate interpretation by NLP systems. This is especially useful for interpretation in new domains and genres, leading to advances in NLP technologies, such as question answering and machine translation, thus improving information access. Additionally, insights into statistical properties of constructions gained through this research are valuable for psycholinguistic models of language acquisition and second language learning.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1116782
Program Officer: Tatiana Korelsky

Project Start
Project End
Budget Start: 2011-09-01
Budget End: 2015-08-31
Support Year
Fiscal Year: 2011
Total Cost: $300,000
Indirect Cost

RI: Small: A Bayesian Approach to Dynamic Lexical Resources for Flexible Language Processing
Palmer, Martha
University of Colorado at Boulder, Boulder, CO, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments