This project builds, tests, and validates an open-source automated system for coding social movement data from electronically available news sources. Although no source is perfect, scholars agree that the best general source of such data, specifically that on social protest, is a compilation of a large number of news sources. This project draws on the advances in machine learning developed in computer science and statistics and combining them with our deep substantive knowledge as sociologists of the problems of identifying and coding collective action events in news sources. An existing highly-regarded hand-coded data set based on the New York Times is used as the reference for "training" machine learning algorithms -- called "classifiers" -- to recognize elements of an action event in a news article and extract relevant information. We also collect and hand-code new data drawn from other regional, national, and international news sources to provide additional training sets to increase the range and variety of protests we are able to detect. A supplement to the project provides research experience for undergraduates who will be involved in collecting and coding these new data.

This project builds, tests, and validates an open-source automated system for coding data from electronically available news sources. It advances the state of data collection in social science and employs the latest developments in natural language processing and supervised machine learning within computer science and statistics. The result will be an open-source, publicly available system that may be used by other researchers and further improved and expanded.

This project promises to provide an important new methodological tool of broad interdisciplinary value to social scientists and to open the door to more efficiently compiling collective action-data from news sources that can improve both academic scholarship and public policy. All of the code for this project will be released under an open-source license and publicly accessible through a public source code repository. This work will be accessible and useful to scholars in social movements, international relations, and foreign policy. The work will also be of use to a large number of non-academics, such as foreign policy analysts and decision-makers, journalists, and those interested in computational methods of textual analysis and classification. The ability to code collective action data more efficiently and accurately from news sources has broad policy applicability.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Application #
1423784
Program Officer
Joseph Whitmeyer
Project Start
Project End
Budget Start
2014-08-15
Budget End
2017-07-31
Support Year
Fiscal Year
2014
Total Cost
$341,831
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715