This Computing Research Infrastructure planning grant addresses two challenges for automatic systems performing deep semantic processing: identifying the context-appropriate sense of polysemous words and interpreting the meanings and interrelations of verbs and nouns in event-denoting phrases. Preliminary steps are taken for aligning and linking four existing widely used lexical resources (WordNet, FrameNet, PropBank and VerbNet) with different but complementary contents and coverage. Methods for completing current cross-resource links and full transitive closure are explored and tested. The resulting infrastructure (LexLink) is designed to make the resources fully interoperable, capitalizing on their particular strengths with respect to word sense disambiguation and Semantic Role labeling.

Four activities are carried out in the context of planning LexLink. First, a workshop is held where key representatives of the Natural Language Processing and computational semantics communities articulate needs and requirements for the planned resource and offer advice on algorithms, annotation techniques and evaluation. Second, a subsection of cross-resource links for word senses and Semantic Role labels (Agent, Instrument, etc.) resulting from the automatic transitive closure is evaluated, yielding estimates for the error rate and leading to fine-tuning of algorithms. Third, current best performing mapping algorithms for word senses and Semantic Role labels are evaluated against a human-annotated Gold Standard. Fourth, new Gold Standard data are created for additional training and testing and to refine existing algorithms. As a whole, the work provides a solid foundation for a resource with significant beneficial impact on a range of natural language applications, including machine translation, text summarization and sentiment analysis affecting areas such as health care, marketing, and education.

Project Report

Major goals The goal of the Planning Grant was to advance the state of the art for multilingual information processing by investigating the feasibility of and laying the groundwork for a new lexical resource called LexLink: a rich, freely available infrastructure that would constitute a springboard for increasingly deeper levels of automatic semantic representations of sentences in context, leading to improved text understanding and inferencing capabilities. LexLink would draw on existing, independently constructed and freely available large lexical resources: WordNet, FrameNet, VerbNet, OntoNotes, and PropBank, as well as on an existing database, SemLink, which includes partial mappings among these databases. Major Activities A workshop was held on May 26, 2012 in Istanbul, Turkey, coinciding with the 8th Language Resource and Evaluation Conference (LREC). The workshop gathered representative members of the broader NLP community to discuss the contents, design and availability of the proposed infrastructure. Researchers, developers and users from academe and industry attended, gave presentations and shared in the discussions. We presented our vision for LexLink as described in the proposal, along with some preliminary alignment work towards creating a larger manually annotated Gold Standard as well as results from our evaluation of manual and automatic alignment algorithms. Colleagues who had independently undertaken automatic mappings of resources presented and compared their approaches and algorithms. SemLink, a manual mapping of PropBank, grouped WordNet senses, FrameNet and VerbNet for the Wall Street Journal corpus was described, as was UBY, a German Lexical Resource, which has built on SemLink by extending it with automatic mappings and additional resources. We collaborated with Professors Eneko Agirre and Mona Diab for the *SEM ("star sem") Shared Task sponsored by the Association for Computational Linguistics (ACL) special interest groups SIGLEX and SIGSEM. *SEM is the joint conference on lexical and computational semantics that aims to provide a forum for NLP researchers working on different aspects of semantic processing. The overarching theme of the Shared Task proposal is the evaluation of algorithms designed to measure semantic similarity. Semantic similarity is one of the main features that has contributed to the success of the current algorithms for automatic alignment of sense entries, so LexLink provides a very appropriate data set for the evaluation of different approaches. We provided one of the task data sets, Gold Standard training and testing data for the alignment of several words and senses across FrameNet and WordNet resources, and we invited participants to evaluate and compare their algorithms in a systematic way. Descriptions of the participating systems and an evaluation of their performance were presented at the second *SEM conference co-located with the North American Chapter of the Association for Computational Linguistics and Human Language Technologies 2013 conference (NAACL HLT) in Atlanta, USA, June 9-14, 2013. Specific Objectives Our objectives, which guided the activities described above, were to evaluate the scope of our project, based on our existing familiarity with the four databases and the current links among them. Specifically, we made progress towards: identifying discrepant links in existing resources migrating event nouns in the WordNet lexical database to other resources identifying gaps in coverage by comparing entry lists in the databases laying down an infrastructure for the recognition of light verb constructions developing methods for the mapping of semantic roles across the resources Significant Results The workshop held at the LREC conference led to the formulation of specific goals for future LexLink work, which will benefit a wide user community. Another significant result is the release of an updated SemLink database with the addition of FrameNet mappings. Key Outcomes An enhanced version of SemLink was released in May, 2013. It incorporates the new links and expansions to existing databases that were undertaken during the planning grant. The new SemLink 1.2 resource contains about 78,000 tagged instances of usage that are mapped among FrameNet, PropBank, VerbNet, and OntoNotes (the latter internally references WordNet). Intellectual Merit The work performed under the grant paved the way towards measurable improvements in effective automatic text understanding on a deeper semantic level. Future applications that we expect to benefit from the enhanced resources include crosslingual language processing and automatic reasoning and inferencing, a notoriously difficult problem. Broader Impact The resources that were the focus of our work will continue to provide a valuable foundation for a wide range of Natural Language Processing Applications of interest to a broad community. The resources, which specifically address the considerable challenge of semantic interpretation shared by many applications, are available at the following URLs: SemLink: http://verbs.colorado.edu/semlink WordNet: http://wordnet.princeton.edu FrameNet: https://framenet.icsi.berkeley.edu PropBank: Available from the Linguistic Data Consortium, www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2004T14 OntoNotes: Available at no cost from the Linguistic Data Consortium, www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2011T03 UBY: www.ukp.tu-darmstadt.de/data/lexical-resources/uby

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1205484
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2012-06-01
Budget End
2013-05-31
Support Year
Fiscal Year
2012
Total Cost
$25,000
Indirect Cost
Name
University of Colorado at Boulder
Department
Type
DUNS #
City
Boulder
State
CO
Country
United States
Zip Code
80303