Computational aspects of scientific experiments have been growing steadily. This creates an increasing need to be able to reproduce the results. Science is also increasingly performed by exploring diverse sets of data. Unsurprisingly, there is a demand for being able to easily repeat the numerous transformations performed. Software packaged with tools from this project will allow scientists to publish their code in a form that can be utilized by others with minimal effort. By eliminating many of the challenges of building, configuring, and running software, it will allow members of the scientific community to more easily reproduce each others' computational results.
Increasingly, entire virtual machines are published to ensure that a recipient does not have to replicate the compute environment, retrieve data and code dependencies, or invest effort into configuring the system. However, this approach scales poorly with the growth in size of the included data sets, the extraneous functionality in applications that utilize versatile software libraries, and the irrelevant code in stock operating system distributions. This project will design, develop, and evaluate a toolchain that allows scientists to transform their software into specialized applications with all the necessary environmental conditions and portions of required data sets built directly into the code. The resulting scientific appliances can be distributed for others to explore and verify results without the overhead of shipping extraneous data and code.