RI: Small: Actively Learning From The Crowd

Heckel, Reinhard

Abstract

How can a large number of questions be answered from few and only partially correct responses? This problem lies at the heart of labeling large collections of unlabeled data, which are common in machine learning and data science. It also lies at the heart of learning about preferences of people and quality of items by carrying out surveys. A popular approach to address this problem is to crowdsource the labeling or learning task by paying a large number of people small amounts of money to answer questions on the internet through a crowdsourcing platform. However, the quality of the workers responses varies significantly due to different abilities of the people and difficulties of the questions. To account for the uncertainty of the responses, each question is assigned to multiple people and their responses are aggregated. However, the assignment process is often agnostic to the peoples abilities and questions difficulties, since both are unknown a priori. This project will develop algorithms that adapt to the people and questions and thereby significantly reduces the number of responses required to enable machine learning algorithms to perform well and surveys to be informative. Besides the research objectives, the researchers will pursue educational objectives by integrating parts of this project into a graduate class, promoting undergraduate research, and fostering exchange across disciplines by running an interdisciplinary machine learning seminar.

The core optimization problem in crowdsourcing is to achieve confidence in the final answers at minimal cost, by assigning only few tasks to the people (or workers). Intuitively, that can be accomplished by only posing a question to the workers best qualified to answer that question. This project will develop efficient and practical active schemes for crowdlabeling and crowdsourcing that adaptively choose which question to pose to which worker. For each algorithm, the project will prove corresponding rigorous computational and statistical problem-instance dependent performance guarantees, as opposed to worst-case performance guarantees. The theoretical results will be complemented with practical open-source implementations and experiments on real-world data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1816986
Program Officer: Rebecca Hwa

Project Start
Project End
Budget Start: 2018-09-01
Budget End: 2021-08-31
Support Year
Fiscal Year: 2018
Total Cost: $474,322
Indirect Cost

RI: Small: Actively Learning From The Crowd
Heckel, Reinhard
Rice University, Houston, TX, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments