WordNet is an important lexical resource for research in areas including NLP and AI. This project initiates the development of a radically enhanced version of WordNet. Constructing WordNet+ involves a novel combination of empirical methods: human annotation, corpus analysis, and machine learning. WordNet+ specifically addresses some of WordNet's limited ability to identify word senses, stemming from the sparsity of Boolean arcs among sets of synonymous words ("synsets"). First, quantified, oriented arcs are to be added among a core set of 5,000 synsets. These arcs reflect evocation--the extent to which the meaning of one synset brings to mind another. Following the selection of the core synsets, a random subset of 250,000 arcs are to be elicited from annotators. The annotators, trained and tested for inter- and intra-reliability, record the strength of their mental associations using a specially designed and tested interface. The remaining arcs are to be extrapolated from the manually obtained arcs using machine learning algorithms.

All results will be made available to the research community: the core concepts, the indirect co-occurrence matrices, and all available ratings. Given WordNet's past contributions to a number of diverse disciplines, the initial stages of the construction of this research tool should stimulate great interest and have a significant impact on related work.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0414072
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2004-09-01
Budget End
2006-08-31
Support Year
Fiscal Year
2004
Total Cost
$106,000
Indirect Cost
Name
Princeton University
Department
Type
DUNS #
City
Princeton
State
NJ
Country
United States
Zip Code
08540