Most of the advances in science in the last 400 years come not merely from researchers working by themselves, but rather from a community of scholars cooperating and competing in pursuit of shared goals. Critical components of this community are built from scholarly citation, which turn isolated works into a network of scholarship that can be navigated and mined. For centuries, the outcome of such scholarly endeavors were written publications. With the coming of the digital age, new forms of scholarly output, such as data collections and digital publications have become commonplace. Unfortunately, the practices of citation and attribution that have been the mainstay of written publications are insufficient for this new digital world. A citation for digital data needs to be more descriptive than a reference to the location of the item; it needs to describe what the data is, where it came from, and how it was produced. This research will yield new techniques, tools and demonstrations of an extended citation service that uses data provenance, a formal record of how an object came to be in its current form. This extended citation service will facilitate activities such as research reproduction and attribution.

The provenance-enabled data citation system developed in this work will both be embedded in an existing data platform (specifically, Dataverse) as well as functioning as a standalone service. The system addresses the following set of specific data citation challenges: It directly includes executable transformations for a limited, but important set of tools: R and SQL. For other tools, it provides a standardized documentation capability to describe transformations. The system is sufficiently flexible to serve either as part of a publication workflow, where data is part of a more conventional publication, or in support of a standalone publication. It also provides data summaries.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1448123
Program Officer
Rajiv Ramnath
Project Start
Project End
Budget Start
2015-01-01
Budget End
2017-12-31
Support Year
Fiscal Year
2014
Total Cost
$300,000
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138