This workshop is funded by the Political Science Program and the Studies of Policy, Science and Engineering component of the Science and Society Program. The workshop will bring social and computer scientists together to discuss coding techniques in order to reshape the study of social and political phenomena via the combined use of manual and machine annotation. The longer-term goals are to foster a lasting community of interest focused around the development of new annotation research partnerships and to do so in a way that sets a high scientific standard for consistency, reliability, transparency, and replicability. Coded text corpora are used for basic and applied research in social and computational sciences. Yet the manual annotation of text "the coding" is often conducted in an ad hoc, inconsistent, non-replicable, and unreliable manner. While many researchers from a variety of disciplines stand to benefit from reliably recorded, publicly available, transparent, large-scale observations produced by independent coders, very few current researchers can say with confidence that they know where to acquire, or even how to produce, such annotated text corpora. A primary short-term objective of this workshop is to build on the efforts of a small number of existing projects that feature the manual annotation of text. A related goal is to foster a particular approach to manual coding that is designed to be useful for building algorithms for basic and applied research on social and political issues. The workshop will be small, project-centric, and by invitation only. All of the workshop participants will be together in a seminar-style setting focusing on one project at a time for an extended period. Assuming a maximum of six projects, both days will feature a 2-hour block of time devoted to a single project, as well as a wrap-up session devoted to the collaborative summarization of important workshop findings. All participating projects will be required to prepare and disseminate a specific set of materials in advance of the workshop via a workshop wiki. High quality manual annotation opens up the possibility for cross-disciplinary studies featuring collaboration between social and computational scientists. This second opportunity exists because researchers in the computational sciences, particularly those working in text classification, information retrieval, opinion detection, and variants of natural language processing (NLP), are hungry for the elusive "gold standard" in manual annotation. Reliably coded text corpora of sufficient size and consistency are essential to design and train NLP algorithms.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
0620673
Program Officer
stephen zehr
Project Start
Project End
Budget Start
2006-09-01
Budget End
2007-08-31
Support Year
Fiscal Year
2006
Total Cost
$25,000
Indirect Cost
Name
University of Pittsburgh
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213