Machine-readable lexical resources are essential to Natural Language Processing applications such as information extraction and machine translation. The largest lexicon is WordNet, with semantic information about more than 150,000, or lexical units (LUs). A smaller, independently developed resource is FrameNet, which provides detailed information about the syntactic patterns for LUs. The project investigates the ways in which these complementary resources can be combined using the semantic-syntactic information from FrameNet (FN) where available and falling back on less detailed entries from WordNet (WN) in other cases.
WN and FN exhibit fundamentally different design principles. WN groups (near) synonymous LUs into "synsets," which are interconnected via conceptual and lexical relations to form a semantic network. FN groups LUs according to the "semantic frame" they evoke, which is a type of event, relation or state along with the participants involved in the event. Thus, while antonyms such as _praise_ and _blame_ may be in the same FN frame they are in different, though interlinked, WN synsets. Moreover, FN frames cover semantically related nouns, verbs and adjectives; WN synsets do not mix part of speech. Crucially for NLP applications, the resources differ with respect to sense distinctions. Alignment will be investigated for the following differences: lexical coverage, sense distinctions, taxonomic and other semantic relations, and scalar frames for adjectives. Some 1,000 word senses are examined in detail so as to provide an idea of the distribution of each of these phenomena over the entire lexicon.
This theoretical work lays the foundation for constructing a unique, invaluable resource for the NLP community.