A hallmark of the scientific method has been that experiments should be described in enough detail that they can be repeated and perhaps generalized. This implies the possibility of repeating results on nominally equal configurations and then generalizing the results by replaying them on new data sets, and seeing how they vary with different parameters. In principle, this should be easier for computational experiments than for natural science experiments, because not only can computational processes be automated but also computational systems do not suffer from the "biological variation" that plagues the life sciences. Unfortunately, the state of the art falls far short of this goal. Most computational experiments are specified only informally in papers, where experimental results are briefly described in figure captions; the code that produced the results is seldom available; and configuration parameters change results in unforeseen ways. Because important scientific discoveries are often the result of sequences of smaller, less significant steps, the ability to publish results that are fully documented and reproducible is necessary for advancing science. While concern about repeatability and generalizability cuts across virtually all natural, computational, and social science fields, no single field has identified this concern as a target of a research effort.
This collaborative project between the University of Utah and New York University consists of tools and infrastructure that supports the process of sharing, testing and re-using scientific experiments and results by leveraging and extending the infrastructure provided by provenance-enabled scientific workflow systems. The project explores three key research questions: (1) How to package and publish compendia of scientific results that are reproducible and generalizable. (2) What are appropriate algorithms and interfaces for exploring, comparing, re-using the results or potentially discovering better approaches for a given problem? 3) How to aid reviewers to generate experiments that are most informative given a time/resource limit.
An expected result of this work is a software infrastructure that allows authors to create workflows that encode the computational processes that derive the results (including data used, configuration parameters set, and underlying software), publish and connect these to publications where the results are reported. Testers (or reviewers) can repeat and validate results, ask questions anonymously, and modify experimental conditions. Researchers, who want to build upon previous works, are able to search, reproduce, compare and analyze experiments and results. The infrastructure supports scientists, in many disciplines, to derive, publish and share reproducible results. Results of this research, including developed software will be available via the project web site ( www.vistrails.org/index.php/RepeatabilityCentral).
Ever since Francis Bacon, a hallmark of the scientific method has beenthat experiments should be described in enough detail that they can berepeated and perhaps generalized.When Newton said that he could see farther because he stood on the shouldersof giants, he depended on the truth of his predecessors' observationsand the correctness of their calculations. In modern terms,this implies the possibility ofrepeating results on nominally equal configurations and thengeneralizing the results by replaying them on new data sets, andseeing how they vary with different parameters. In principle, thisshould be easier for computational experiments than for naturalscience experiments, because not only can computational processes beautomated but also computational systems do not suffer from the``biological variation'' that plagues the life sciences. Unfortunately, the state of the art falls far short of this goal. Most computational experiments are specified only informally in papers, where experimental results are briefly described in figure captions; the code that produced the results is seldom available; and configuration parameters change results in unforeseen ways. This has serious implications. There have been several instances in the recent past of mistakes discovered in papers and research. In the absence of reproducibility, it becomes hard and sometimes impossible to verify scientific results. Furthermore, scientific discoveries do not happen in isolation. Important advances are often the result of sequences of smaller, less significant steps. If results are not fully documented, reproducible, and generalizable, it becomes hard to re-use and extend them. The goal of this project is to greatly simplify the process of sharing, testing and re-using scientific experiments and results. Our technology supports the reproducibility committees of the two major conferences in our subfield of computer science: ACM SIGMOD and VLDB. We have presented tutorials about reproducibility that explain the landscape of tools that are available. We are developing a community experimental platform where authors are assisted in the process of creating and publishing reproducible results and that enables the generalization of these results. The idea will be to develop a protocol in which very little effort gives a large benefit in reproducibility.