Establishing causal relationships -- for example, that cigarette smoking causes lung cancer -- is one of the most challenging aspects of scientific research. Computers excel at calculation, but are unable to separate cause-and-effect from mere correlation. Humans, on the other hand, can make logical conclusions based on their experiences but, in the modern era of Big Data, there are far too many potential relationships for humans to manually examine. This research aims to build a crowdsourcing web platform to use the knowledge of interested non-experts (Hunch) and the algorithmic power of computers (Crunch) to discover and test causal relationships in large-scale data. Algorithms identify potential relationships and users are asked to validate them. Further, users are able to propose their own hypotheses that can subsequently be validated, creating an accelerating feedback loop of scientific discovery. The goal of systematically discovering causal relationships has the potential for broad societal impact, and virtually anyone with web access can participate directly in this scientific research.
To support this goal, the researchers are developing novel statistical methods that determine the data types of crowd-suggested observables on the fly. For example, are 'wages' and 'gender' real-valued or binary variables? Finally, the crowd is a relatively limited resource. To use it efficiently, machine learning algorithms would identify which substructures in the correlational network are most likely to be causal, and then focus the crowd's efforts towards them. These efficient, adaptive methods allow causal relationships to be combined into larger chains that explain growing numbers of causes and effects.