CAREER: Learning- and Incentives-Based Techniques for Aggregating Community-Generated Data

Vaughan, Jennifer

Abstract

The Internet has led to the availability of novel sources of data on the preferences, behaviors, and beliefs of massive communities of users. Both researchers and engineers are eager to aggregate and interpret this data. However, websites sometimes fail to incentivize high-quality contributions, leading to variable quality data. Furthermore, assumptions made by traditional theories of learning break down in these settings.

This project seeks to create foundational machine learning models and algorithms to address and explain the issues that arise when aggregating local beliefs across large communities, and to advance the state-of-the-art understanding of how to motivate high quality contributions. The research can be split into three directions:

1. Developing mathematical foundations and algorithms for learning from community-labeled data. This direction involves developing learning models for data from disparate (potentially self-interested or malicious) sources and using insight from these models to design efficient learning algorithms.

2. Understanding and designing better incentives for crowdsourcing. This direction involves modeling crowdsourcing contributions to determine which features to include in systems to encourage the highest quality contributions.

3. Introducing novel economically-motivated mechanisms for opinion aggregation. This involves formalizing the properties a prediction market should satisfy and making use of ideas from machine learning and optimization to derive tractable market mechanisms satisfying these properties.

This research will have clear impact on industry, especially for web-based crowdsourcing. The PI will pursue her long-term goal of attracting and retaining women in computer science via her involvement in workshops and mentoring programs. Results will be disseminated at www.cs.ucla.edu/~jenn/projects/CAREER.html.

Project Report

The primary goal of this project was to advance the state-of-the-art understanding of how to elicit and aggregate high quality information and beliefs from (online) crowds. The results can be broken down into three primary directions. First, this research produced novel online learning algorithms to assign crowdworkers to tasks. Crowdsourcing markets provide a way for requesters to inexpensively obtain distributed labor and information, and have recently become popular among researchers who use them to conduct user studies, run behavioral experiments, and collect data that is easy for humans to generate but difficult for computers. Unlike in traditional labor markets, requesters interact with many workers rapidly and can potentially adjust their behavior as they learn more about salient features of the environment, such as workersâ€™ skill levels, the difficulty of their tasks, and workersâ€™ willingness to accept their tasks at given prices. We addressed the challenge that a requester faces when assigning heterogeneous tasks to workers with unknown, heterogeneous skills. We formalized this "online task assignment problem," provided a task assignment algorithm that is provably near-optimal given that workers return repeatedly, and evaluated this algorithm on data collected from the popular crowdsourcing platform Amazon Mechanical Turk. We then extended this line of work to cover classification or labeling tasks in which the quality of work cannot be judged immediately. Second, this research contributed to a more thorough understanding of performance-based incentives in crowdsourcing systems. We designed and ran randomized behavioral experiments on Amazon Mechanical Turk with the goal of understanding when, where, and why performance-based payments improve work quality, identifying properties of the payment, payment structure, and the task itself that make them most effective. Based on our findings, we proposed a new model of worker behavior that extends the standard principal-agent model from economics to include a workerâ€™s subjective beliefs about his likelihood of being paid. We also designed an algorithm using multi-armed bandit techniques for optimally setting the amounts of performance-based payments for tasks assuming workers strategically choose their level of effort. Finally, this research contributed a thorough study of several key research questions surrounding the design of prediction market mechanisms that elicit and aggregate beliefs from crowds of traders. A prediction market is a market in which traders buy and sell securities with values that are contingent on the outcome of a future event. For example, a security may pay off $1 if a Democrat wins the 2016 US Presidential election and $0 otherwise. The market price of such a security is thought to reflect the tradersâ€™ collective belief about the likelihood that a Democrat will win. To facilitate trade, a prediction market can be operated by an automated market maker, an algorithmic agent that offers to buy or sell securities at some current market price determined by a pricing mechanism that makes use of the trade history. The market maker provides liquidity in the market, effectively subsidizing the market and rewarding traders for their private information. This is especially useful in "combinatorial markets" which offer securities defined on a large or infinite outcome space and propagate information (in the form of prices) appropriately across logically-related securities. We proposed a general framework for the design of efficient pricing mechanisms over very large or infinite outcome spaces. We took an axiomatic approach, specifying a set of formal mathematical properties that any reasonable market should satisfy and fully characterized the set of pricing mechanisms that satisfy these properties. Then, using techniques from convex analysis, we provided a method for designing specific pricing mechanisms that satisfy these properties, and described how to balance trade-offs between different features of these pricing mechanisms. To better understand and characterize the space of possible pricing mechanisms, we studied extensions of this basic framework that allow the market maker to elicit full distributions over continuous random variables, or to make a guaranteed profit given sufficient volume of trade and disagreement among traders. To make the markets more practical, we derived a principled method of reducing liquidity along certain dimensions of the market when particular information becomes less valuable to the market maker over time. This project contributed to the training of graduate students who actively participated in the research. It enabled the PI to develop a new graduate seminar on "Mathematical Frameworks of Social Computing" and a tutorial presentation on "Prediction, Belief, and Markets" that was presented at several international conferences as well as the Machine Learning Summer School at UC Santa Cruz. The PI additionally organized several related workshops at international conferences to foster discussion in the research community, and engaged in a variety of outreach activities aimed at recruiting and retaining women in computer science, including serving on the Executive Board of the Women in Machine Learning Workshop and as Program Co-Chair of the first Celebration of Women in Computing in Southern California.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1054911
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 2011-06-01
Budget End: 2015-03-31
Support Year
Fiscal Year: 2010
Total Cost: $238,627
Indirect Cost

CAREER: Learning- and Incentives-Based Techniques for Aggregating Community-Generated Data
Vaughan, Jennifer
University of California Los Angeles, Los Angeles, CA, United States

Abstract

Project Report

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Project Report

Funding Agency

Institution

Comments