Most word-centered linguistic annotations of texts proceed by identifying keywords and labeling the phrases around them that show their roles in the meaning structures evoked by the keywords. This procedure misses most idioms (took a turn for the worse) and irregular grammatical patterns (only then would she agree to it). The "Beyond the Core" project is exploring ways of augmenting such annotations with layered representations of multiword units and "non-core" grammatical constructions present in such texts. Toward this end, using FrameNet annotation tools, researchers are finding non-core structures in texts and labeling the phrases in a way that shows how they satisfy formal and semantic constraints dictated by the individual constructions. The "Constructicon", where such information is archived, links each construction with annotated sentences that exemplify it.

Although there is a strong interest in non-core structures in the Computational Linguistics community, researchers don't know how many there are, how important they are in NLP applications, how frequent they are in texts of different kinds, or whether the skills that enable trained linguists to recognize them can be reliably communicated to time-pressured annotators. This empirical study is providing that missing information.

The Constructicon and the full body of annotations will be made available to researchers via the FrameNet website, in both human-browsable and machine-readable form. The data will provide rich material for research on parsing, language understanding, and compositional semantics, and may possibly serve as a training corpus for machine-learning methods of detecting known non-core constructions in raw text.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0739426
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2007-09-15
Budget End
2009-02-28
Support Year
Fiscal Year
2007
Total Cost
$100,454
Indirect Cost
Name
International Computer Science Institute
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704