While crowdsourcing and human computation methods are rapidly transforming the practice of data collection in research and industry, ensuring quality of the collected data remains difficult in practice and exposes projects relaying on quality of crowdsourced data to significant risk. This reduces the benefits of crowdsourcing for both current adopters and a wider community of potential beneficiaries. Although diverse communities have developed statistical algorithms for quality assurance, the splintered nature of these communities has led to relatively little comparative benchmarking and/or integration of alternative techniques. Dearth of reference implementations and shared datasets has further abated progress, as have evaluations based on tightly-coupled systems, domain specific tasks, and synthetic data. This project investigates, integrates, and rigorously benchmarks diverse quality assurance algorithms across a range of tasks, data scales, and operational settings. Overall, technical findings are expected to transform current understanding of quality assurance methods for crowdsourcing, including identifying key limitations of the current state-of-the-art in order to focus ongoing research and innovation where it can have the greatest impact. Reference implementations of key algorithms are designed to support reuse, reproducible findings, continual benchmarking, and ongoing progress. Project will yield new, sanitized public datasets to support ongoing community benchmarking and shared-task evaluations.

Technical contributions from the project are expected to offset risk of growing social inequity as online, distributed work becomes increasingly prevalent. Assumptions that crowdsourcing workers are unreliable or interchangeable limit the complexity and scope of work which can be successfully accomplished with online crowdsourcing. By limiting the amount of work available online and opportunities for skilled work, this further restricts the range of upward economic mobility achievable via crowd work. By developing effective methods to measure work quality over time and identify trusted workers, it will be possible to differentiate, recognize, and reward quality work to promote merit-based economic mobility. Educational activities include a new crowdsourcing course designed for college freshman from diverse backgrounds, a graduate seminar integrating the project's research software, presentations to the student chapter of a professional society, and tutorials and short courses benefiting industry practitioners and researchers. This project will inform the principal investigator's community advisory and organizational activities, including Advisory Board service for an annual industrial conference and organizing workshops bringing the industry and research communities together. Information and results will be disseminated via the project web site (http://ir.ischool.utexas.edu/career/).

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1253413
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-03-15
Budget End
2019-02-28
Support Year
Fiscal Year
2012
Total Cost
$582,000
Indirect Cost
Name
University of Texas Austin
Department
Type
DUNS #
City
Austin
State
TX
Country
United States
Zip Code
78759