How can a large number of questions be answered from few and only partially correct responses? This problem lies at the heart of labeling large collections of unlabeled data, which are common in machine learning and data science. It also lies at the heart of learning about preferences of people and quality of items by carrying out surveys. A popular approach to address this problem is to crowdsource the labeling or learning task by paying a large number of people small amounts of money to answer questions on the internet through a crowdsourcing platform. However, the quality of the workers responses varies significantly due to different abilities of the people and difficulties of the questions. To account for the uncertainty of the responses, each question is assigned to multiple people and their responses are aggregated. However, the assignment process is often agnostic to the peoples abilities and questions difficulties, since both are unknown a priori. This project will develop algorithms that adapt to the people and questions and thereby significantly reduces the number of responses required to enable machine learning algorithms to perform well and surveys to be informative. Besides the research objectives, the researchers will pursue educational objectives by integrating parts of this project into a graduate class, promoting undergraduate research, and fostering exchange across disciplines by running an interdisciplinary machine learning seminar.

The core optimization problem in crowdsourcing is to achieve confidence in the final answers at minimal cost, by assigning only few tasks to the people (or workers). Intuitively, that can be accomplished by only posing a question to the workers best qualified to answer that question. This project will develop efficient and practical active schemes for crowdlabeling and crowdsourcing that adaptively choose which question to pose to which worker. For each algorithm, the project will prove corresponding rigorous computational and statistical problem-instance dependent performance guarantees, as opposed to worst-case performance guarantees. The theoretical results will be complemented with practical open-source implementations and experiments on real-world data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2018-09-01
Budget End
2021-08-31
Support Year
Fiscal Year
2018
Total Cost
$474,322
Indirect Cost
Name
Rice University
Department
Type
DUNS #
City
Houston
State
TX
Country
United States
Zip Code
77005