Citation is an essential part of scientific publishing and, more generally, of scholarship. It is used to gauge the trust placed in published information and, for better or for worse, is an important factor in judging academic reputation. Now that so much scientific publishing involves data and takes place through a database rather than conventional journals, how is some part of a database to be cited? More generally, how should data stored in a repository that has complex internal structure and that is subject to change be cited?

The goal of this research is to develop a framework for data citation which takes into account the increasingly large number of possible citations; the need for citations to be both human and machine readable; and the need for citations to conform to various specifications and standards. A basic assumption is that citations must be generated, on the fly, from the database. The framework is validated by a prototype system in which citations conforming to pre-specified standards are automatically generated from the data, and tested on operational databases of pharmacological and Earth science data. The broader impact of this research is on scientists who publish their findings in organized data collections or databases; data centers that publish and preserve data; businesses and government agencies that provide on-line reference works; and on various organizations who formulate data citation principles. The research also tackles the issue of how to enrich linked data so that it can be properly cited.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1302212
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-08-01
Budget End
2018-07-31
Support Year
Fiscal Year
2013
Total Cost
$856,290
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104