Data provenance is the ability to track data history including things such as where the data resided, who handled it, and what systems stored, forwarded and processed it. This research builds on the architecture of the digital currency Bitcoin. It develops distributed data ledgers - similar to bookkeeping ledgers - that maintain data history so it can't be manipulated by hackers trying to hide their activities. Data consistency guarantees that everyone gets the right answers about where, who, and what regardless of which ledger is read. This software advances the security of computing systems by making data accountable, especially for online commerce and big data ("the cloud''). It secures forensic information taken from compromised computers for further analysis. It validates whether privacy requirements are being met for medical records. The key outcome is a software prototype that implements the complete system and illustrates the ability to store, maintain, and update provenance information for real data.

A data provenance framework will be designed, prototyped, evaluated and then delivered as an Application Programmer Interface, software library, and distributed service. This work will produce a reusable distributed service architecture achieving scalability by using distributed services that maintain ledger information. The system leverages Bitcoin cryptocurrency by building on Bitcoin's block-chain architecture to maintain provenance metadata securely. It leverages existing tools for provenance data exploration and visualization. Digital signatures from both the server/system as well as the user creates dual information about possession, while distributed ledgers remove control and maintenance of metadata from the user who creates it. The prototype enables research into long-term provenance creation, maintenance, and utilization for workflows in the area of cybersecurity as well studies of how to integrate and secure provenance into existing file systems and network services. Opt-in and passive (involuntary) provenance systems will be enabled using the API, library, and distributed ledgers prototyped, enabling data provenance for systems where needed, notably high assurance cloud computing and scientific workflow systems. The tool can be used to enable reproducibility of published results from archived data and artifacts.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1547164
Program Officer
Robert Beverly
Project Start
Project End
Budget Start
2016-01-01
Budget End
2019-12-31
Support Year
Fiscal Year
2015
Total Cost
$321,519
Indirect Cost
Name
Clemson University
Department
Type
DUNS #
City
Clemson
State
SC
Country
United States
Zip Code
29634