Workflow-based systems have emerged as an alternative to ad-hoc approaches to data exploration that are widely used in the scientific community. Workflows can capture computational tasks at various levels of detail and systematically record the provenance (history) information necessary for reproducibility, result publication and sharing. Although the benefits of using scientific workflow systems are well known, the fact that workflows are hard to create and maintain has been a major barrier to wider adoption of the technology in the scientific domain.
The goal of this project is to produce new algorithms and techniques for exploring and re-using useful knowledge embedded in workflow specifications and in the provenance of the data they manipulate. This project addresses key limitations in existing workflow systems. First, it develops a set of usable tools that enable casual users (who do not necessarily have programming expertise) to perform exploratory tasks and solve problems through workflows. These include intuitive user interfaces to manipulate collections of workflow and to query workflows by example. Second, it builds a scalable provenance management infrastructure to support the efficient execution of these operations.
The research results of this project advance the state of the art and build fundamental knowledge in storing, querying, and re-using provenance of computational tasks. This project has the potential to impact a variety of applications where the creation and maintenance of workflows is currently a major bottleneck. This includes large computational science projects and portals. Furthermore, it makes workflows and workflow technology more accessible to casual users. Through our interdisciplinary collaborations, this project will have immediate impact in helping improve the scientific discovery process. The involvement of graduate and undergraduate students in the project will provide mentoring opportunities. The PI is committed to recruiting minority students. The results of this project will be disseminated as research papers and as freely available tools at the project website: www.cs.utah.edu/~juliana/projects/NSF-IIS-0746500