BIGDATA: F: DKA: CSD: Iterative Crowdsourced Hypothesis Generation

Bagrow, James; Bongard, Joshua; Dodds, Peter; Danforth, Christopher; Hines, Paul

Abstract

Establishing causal relationships -- for example, that cigarette smoking causes lung cancer -- is one of the most challenging aspects of scientific research. Computers excel at calculation, but are unable to separate cause-and-effect from mere correlation. Humans, on the other hand, can make logical conclusions based on their experiences but, in the modern era of Big Data, there are far too many potential relationships for humans to manually examine. This research aims to build a crowdsourcing web platform to use the knowledge of interested non-experts (Hunch) and the algorithmic power of computers (Crunch) to discover and test causal relationships in large-scale data. Algorithms identify potential relationships and users are asked to validate them. Further, users are able to propose their own hypotheses that can subsequently be validated, creating an accelerating feedback loop of scientific discovery. The goal of systematically discovering causal relationships has the potential for broad societal impact, and virtually anyone with web access can participate directly in this scientific research.

To support this goal, the researchers are developing novel statistical methods that determine the data types of crowd-suggested observables on the fly. For example, are 'wages' and 'gender' real-valued or binary variables? Finally, the crowd is a relatively limited resource. To use it efficiently, machine learning algorithms would identify which substructures in the correlational network are most likely to be causal, and then focus the crowd's efforts towards them. These efficient, adaptive methods allow causal relationships to be combined into larger chains that explain growing numbers of causes and effects.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1447634
Program Officer: Sara Kiesler

Project Start
Project End
Budget Start: 2014-09-15
Budget End: 2020-08-31
Support Year
Fiscal Year: 2014
Total Cost: $599,937
Indirect Cost

BIGDATA: F: DKA: CSD: Iterative Crowdsourced Hypothesis Generation
Bagrow, James Bongard, Joshua Dodds, Peter Danforth, Christopher Hines, Paul
University of Vermont & State Agricultural College, Burlington, VT, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments