SGER: Beyond the Core: A Pilot Project on Cataloguing Grammatical Constructions and Multiword Expressions in English.

Fillmore, Charles; Baker, Collin

Abstract

Most word-centered linguistic annotations of texts proceed by identifying keywords and labeling the phrases around them that show their roles in the meaning structures evoked by the keywords. This procedure misses most idioms (took a turn for the worse) and irregular grammatical patterns (only then would she agree to it). The "Beyond the Core" project is exploring ways of augmenting such annotations with layered representations of multiword units and "non-core" grammatical constructions present in such texts. Toward this end, using FrameNet annotation tools, researchers are finding non-core structures in texts and labeling the phrases in a way that shows how they satisfy formal and semantic constraints dictated by the individual constructions. The "Constructicon", where such information is archived, links each construction with annotated sentences that exemplify it.

Although there is a strong interest in non-core structures in the Computational Linguistics community, researchers don't know how many there are, how important they are in NLP applications, how frequent they are in texts of different kinds, or whether the skills that enable trained linguists to recognize them can be reliably communicated to time-pressured annotators. This empirical study is providing that missing information.

The Constructicon and the full body of annotations will be made available to researchers via the FrameNet website, in both human-browsable and machine-readable form. The data will provide rich material for research on parsing, language understanding, and compositional semantics, and may possibly serve as a training corpus for machine-learning methods of detecting known non-core constructions in raw text.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0739426
Program Officer: Tatiana D. Korelsky

Project Start
Project End
Budget Start: 2007-09-15
Budget End: 2009-02-28
Support Year
Fiscal Year: 2007
Total Cost: $100,454
Indirect Cost

SGER: Beyond the Core: A Pilot Project on Cataloguing Grammatical Constructions and Multiword Expressions in English.
Fillmore, Charles Baker, Collin
International Computer Science Institute, Berkeley, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments