This project will advance understanding of how groups of volunteer contributors to online resources perform in the face of sudden, unexpected events related to their work. These volunteer contributors (collectively, "crowds") produce valuable resources such as Wikipedia articles, data for citizen science projects, and open source software. However, though there has been prior research on what aspects of crowds and situations lead to high quality resources, such research normally assumes that the people, groups, and especially situations are relatively stable. In practice, situations often encounter sudden changes, or "shocks": the death of a celebrity or a world event can affect Wikipedia articles related to it, while a software project might release a new version or discover a critical bug. In this project, the investigators will use the public history of Wikipedia articles and open source projects stored on the GitHub website to analyze how crowds react to shocks, and how that affects the resources they create. To do this they will use theories of individual and group behavior to measure meaningful attributes of both the crowds and the resources they work on, then use data analysis techniques to understand (1) how crowds change during shocks; (2) what attributes of crowds predict high resilience and resource quality in the face of shocks; and (3) how these effects change depending on the type of shock that is experienced. The insights gained from the work will also lead to design recommendations for people who manage the software and communities that enable crowds to create these socially valuable resources.

The work will start by constructing features of crowds and the resources they produce. The focus will be on features that prior empirical work in Wikipedia and work from organization theory suggest will be relevant to performance in collaborative crowdsourcing systems. For crowds, these include elements about team composition and participation behavior including experience, diversity, and work balance; about the amount and tone of team coordination around creating the resources; and about the inferred network structure of the collaborators based on individuals' communication patterns with each other. In the case of resources, it includes attributes including their internal and external popularity, and rated or estimated current quality. The investigators will use these features to analyze how the crowds perform when facing a variety of kinds of shocks, including changes in resource quality, worker status, and relevant external events. To do this, the investigators will develop algorithms to detect times when a crowd has experienced a shock, then use propensity score matching on attributes such as resource quality and crowd size to find comparison sets of similar crowds that have not experienced a shock. They will then use machine learning classifiers and segmented regression analysis techniques to analyze changes in the composition and behavior of the crowd around resources after shocks occur, relative to how the crowd behaves around the comparison resources. Once these models have been developed, the team will apply them to questions of predicting both the anticipated resilience of a crowd to a shock and the potential occurrence of shocks internal to the collaboration system.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1617820
Program Officer
William Bainbridge
Project Start
Project End
Budget Start
2016-09-01
Budget End
2020-08-31
Support Year
Fiscal Year
2016
Total Cost
$515,463
Indirect Cost
Name
Regents of the University of Michigan - Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109