Data provenance documents the inputs, entities, systems, and processes that influence data of interest---in effect providing a historical record of the data and its origins. The generated evidence supports essential forensic activities such as data-dependency analysis, error/ compromise detection and recovery, and auditing and compliance analysis.

This collaborative project is focused on theory and systems supporting practical end-to-end provenance in high-end computing systems. Here, systems are investigated where provenance authorities accept host- level provenance data from validated provenance monitors, to assemble a trustworthy provenance record. Provenance monitors externally observe systems or applications and securely record the evolution of data they manipulate. The provenance record is shared across the distributed environment.

In support of this vision, tools and systems are explored that identify policy (what provenance data to record), trusted authorities (which entities may assert provenance information), and infrastructure (where to record provenance data). Moreover, the provenance has the potential to hurt system performance: collecting too much provenance information or doing so in an inefficient or invasive way can introduce unacceptable overheads. In response, the project is further focused on ways to understand and reduce the costs of provenance collection.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0937944
Program Officer
Almadena Y. Chtchelkanova
Project Start
Project End
Budget Start
2009-09-15
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$307,073
Indirect Cost
Name
Pennsylvania State University
Department
Type
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802