A hallmark of the scientific method has been that experiments should be described in enough detail that they can be repeated and perhaps generalized. This implies the possibility of repeating results on nominally equal configurations and then generalizing the results by replaying them on new data sets, and seeing how they vary with different parameters. In principle, this should be easier for computational experiments than for natural science experiments, because not only can computational processes be automated but also computational systems do not suffer from the "biological variation" that plagues the life sciences. Unfortunately, the state of the art falls far short of this goal. Most computational experiments are specified only informally in papers, where experimental results are briefly described in figure captions; the code that produced the results is seldom available; and configuration parameters change results in unforeseen ways. Because important scientific discoveries are often the result of sequences of smaller, less significant steps, the ability to publish results that are fully documented and reproducible is necessary for advancing science. While concern about repeatability and generalizability cuts across virtually all natural, computational, and social science fields, no single field has identified this concern as a target of a research effort.

This collaborative project between the University of Utah and New York University consists of tools and infrastructure that supports the process of sharing, testing and re-using scientific experiments and results by leveraging and extending the infrastructure provided by provenance-enabled scientific workflow systems. The project explores three key research questions: (1) How to package and publish compendia of scientific results that are reproducible and generalizable. (2) What are appropriate algorithms and interfaces for exploring, comparing, re-using the results or potentially discovering better approaches for a given problem? 3) How to aid reviewers to generate experiments that are most informative given a time/resource limit.

An expected result of this work is a software infrastructure that allows authors to create workflows that encode the computational processes that derive the results (including data used, configuration parameters set, and underlying software), publish and connect these to publications where the results are reported. Testers (or reviewers) can repeat and validate results, ask questions anonymously, and modify experimental conditions. Researchers, who want to build upon previous works, are able to search, reproduce, compare and analyze experiments and results. The infrastructure supports scientists, in many disciplines, to derive, publish and share reproducible results. Results of this research, including developed software will be available via the project web site ( www.vistrails.org/index.php/RepeatabilityCentral).

Project Report

Ever since Francis Bacon, a hallmark of the scientific method has been that experiments should be described in enough detail that they can be repeated and perhaps generalized. When Newton said that he could see farther because he stood on the shoulders of giants, he depended on the truth of his predecessors' observations and the correctness of their calculations. In modern terms, this implies the possibility of repeating results on nominally equal configurations and then generalizing the results by replaying them on new data sets, and seeing how they vary with different parameters. In principle, this should be easier for computational experiments than for natural science experiments, because not only can computational processes be automated but also computational systems do not suffer from the ``biological variation'' that plagues the life sciences. Unfortunately, the state of the art falls far short of this goal. Most computational experiments are specified only informally in papers, where experimental results are briefly described in figure captions; the code and actual experiments that produced the results are seldom available. In the absence of reproducibility, it becomes hard and sometimes impossible to verify scientific results. Furthermore, scientific discoveries do not happen in isolation. Important advances are often the result of sequences of smaller, less significant steps. If results are not fully documented, reproducible, and generalizable, it becomes hard to re-use and extend them. While there has been a renewed interest in the publication of reproducible results, a major roadblock to the more widespread adoption of this practice is the fact that it is hard both to derive a compendium that encapsulates all the components (e.g., data, code, parameter settings) needed to reproduce a result. Under NSF sponsorship, our goal in this EAGER award was to design new tools and infrastructure to simplify the process of creating, sharing and evaluating reproducible experiments. Intellectual Merit -We have developed and released the first version of the proposed reproducibility infrastructure. This infrastructure uses the open-source VisTrails system (www.vistrails.org) to support the life-cycle of publications: their creation, review and re-use. As scientists explore a given problem, VisTrails systematically captures the provenance of the exploration, including the workflows created and versions of source code and libraries used. The infrastructure also includes methods to link results to their provenance, reproduce results, explore parameter spaces, interact with results through a Web-based interface, and upgrade the specification of computational experiments to work in different environments and with newer versions of software. Documents (including LaTeX, PowerPoint, Word, wiki and HTML pages) can be created that link to provenance information that allows the results to be reproduced. We have also extended VisTrails to reduce the barrier to entry for novice users. In particular, we created a new tool, CLTools, to help users wrap and package their experiments within VisTrails. This infrastructure was selected as a finalist in the Executable Paper Challenge (www.executablepapers.com/finalists.html). -We have studied the life-cycle of reproducible experiments and introduced the notion of axes of reproducibility, which we then used to categorized reproducible experiments based on levels of reproducibility along the different axes. -We have developed and released ReproZip (https://github.com/fchirigati), an open-source tool that automatically captures the provenance of experiments and packs all the necessary files, library dependencies and variables to reproduce the results. Reviewers can then unpack and run the experiments without having to install any additional software. ReproZip greatly simplifies the process of making experiments reproducible. - We have designed a benchmark to help categorize and better understand existing reproducibility systems. Broader Impacts. -The results of this project were disseminated in 11 scientific publications. In addition, we have developed an open-source system, and we have also contributed to an existing open-source system. -We participated and were finalists in the Executable Paper Challenge. -We collaborated with domain scientists to create and publish a reproducible paper. -We have given a number of talks and tutorials on reproducibility which helped inform scientists in different domains about reproducibility, from its importance for science--the benefit to authors and to the scientific community in general, to the key challenges associated to the publication of reproducible experiments. We have also provided examples of how different communities are approaching the problem and presented solutions and systems that can aid in the creation of reproducible experiments. -Under the sponsorship of the Sloan Foundation, we held three workshops on reproducibility. Two of the workshops brought together scientists from different domains to discuss both the requirements and ongoing reproducibility efforts in their domains. The third workshop brought together tool developers to discuss existing systems, identify gaps, and develop a blueprint for a general infrastructure that can support reproducibility in multiple scientific domains. -We have participated and supported the ACM SIGMOD reproducibility effort. We have also worked with members of the database research community and PVLDB chairs to institute reproducibility evaluation for PVLDB 2013.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1139832
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2011-04-01
Budget End
2013-08-31
Support Year
Fiscal Year
2011
Total Cost
$190,000
Indirect Cost
Name
Polytechnic University of New York
Department
Type
DUNS #
City
Brooklyn
State
NY
Country
United States
Zip Code
11201