III: Small: Scalable Integration and Analysis of the Provenance of Diverse Scientific Data

Gehani, Ashish

Abstract

As scientists begin to get access to data sets that are accompanied by automatically generated provenance records, they are faced with the challenge of integrating and analyzing this metadata. Independent sources are likely to have captured provenance at distinct levels of abstraction, have different levels of completeness, used separate sets of identifiers to refer to the same artifacts, processes, and agents, and introduced dissimilar semantics in the annotations.

This research studies the problem of semi-automatically integrating and analyzing the provenance of scientific data that originates from diverse sources, with independent annotation schema, semantics that may overlap only partially, representations at different granularity, and incomplete characterizations of the activity being recorded. In particular, (i) it develops a formal framework for combining provenance, (ii) provides an extensible software system for provenance ingestion, integration, and analysis, and (iii) creates canonical provenance data sets of various sizes, granularity, and domains, that can be utilized for comparison of provenance integration and analysis algorithms.

Maintaining a record of all the transformations the data undergoes becomes increasingly critical as the length of the analysis grows and the age and diversity of sources of the data grow. Such provenance metadata can address a range of queries. For example, in situations where only derivative data is preserved, a provenance record can help validate claims about the procedures used to obtain the final results. Concerns about whether privacy-sensitive data (such as information from patient records) has been used in contravention to legal or security policies can be alleviated by checking for violations in the provenance records.

More information about the project can be found at: http://spade.csl.sri.com

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1116414
Program Officer: nan zhang

Project Start
Project End
Budget Start: 2011-08-01
Budget End: 2016-08-31
Support Year
Fiscal Year: 2011
Total Cost: $485,868
Indirect Cost

III: Small: Scalable Integration and Analysis of the Provenance of Diverse Scientific Data
Gehani, Ashish
Sri International, Menlo Park, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments