Increasingly, scientific breakthroughs are being enabled by advanced computing capabilities that allow researchers to explore growing volumes of digital data. Unfortunately, the infrastructure to design and conduct computational experiments has not kept pace with our ability to gather and generate data, leading to an unprecedented situation: data exploration is now the bottleneck to discovery. In this project, we will build infrastructure to streamline data exploration that supports a wide array of modern computational resources and tools, as well as enable transparency and reproducibility of scientific results.

This project builds upon the successful VisTrails open-source system. VisTrails provides unique support for data-intensive research, a comprehensive provenance infrastructure, and a user-centered design. The system has been applied to a broad range of disciplines, including environmental science, physics, and bioinformatics, both for research and educational efforts. With a view towards expanding the impact of the VisTrails system to the computer science community, we will extend it in three significant directions, notably: add support for multi-threaded and parallel execution and programming frameworks such as Mapreduce; improve the extensibility of the system through support for Java libraries as well as flexible mechanisms for scripting; and design new infrastructure to simplify the creation and packaging of reproducible experiments. These improvements will enable new research opportunities by providing a platform for computationally-demanding experiments and supporting for large-scale data analysis, as well as for provenance research. In addition, we hope it will contribute to a broader adoption of verified reproducibility in computer science publications.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1405927
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2014-09-01
Budget End
2017-08-31
Support Year
Fiscal Year
2014
Total Cost
$499,962
Indirect Cost
Name
New York University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10012