The proposing professional scholarly organizations, the American Astronomical Society and the American Institute of Physics, will conduct a pilot project to deliver the digital data sets that underlie figures and tables in three of the journals that they publish in astronomy and plasma physics. The project will involve developing methods for identifying and acquiring those digital data, as well as for providing access to the actual data objects in the published literature. The proposers will (i) conduct surveys of authors to determine their willingness to share data and their interest in re-using data that other researchers might publish; (ii) convene expert stakeholders for focused workshops on metadata semantics, digital structures and formats, and on practices for peer review of data; (iii) develop and refine publishing production methods to acquire, validate, deliver, maintain, and curate data; and (iv) raise the awareness of scientists about the merits of and prospects for sharing data. The pilot will be assessed in part by quantitative metrics on the submission of data sets for publication and the use of these data sets by readers of the participating journals, and the outcomes will be disseminated through multiple forums to the scholarly publishing and research communities.
Substantial amounts of digital data are produced routinely in the pursuit of scientific exploration. While much of it can be regarded as ephemeral or intermediate, a considerable amount of data are carefully analyzed and processed. Such data are the result of a good deal of intellectual effort, and they are often valuable in contexts beyond the narrow research questions under which they were first analyzed. Many of these data – or more precisely, the representations of these data – are committed to the scholarly record in the forms of figures and tables that appear within articles published in scholarly journals. In the digital age, it is desirable to have data available in digital forms that are easily readable by computing applications. In this project, the American Astronomical Society (AAS) and the American Institute of Physics (AIP) worked to make more of the data that underlie the figures and tables in selected journals available to researchers in our communities. The journals that were part of the project were the Astronomical Journal, the Astrophysical Journal, and Physics of Plasmas, leading journals in the disciplines of astrophysics and plasma physics. Scholarly reports have evolved over centuries. Those describing scientific results are often based on data, although the data themselves are not always readily available for further study, because the evolution of scholarly communication has resulted in articles that include representations of data in tabular form as well as in charts, graphs, photographs, etc. Collectively, these charts and graphical representations are called "figures". While not every figure and table published in the journals has available data behind it, quite a significant number of high-quality datasets are now available within articles in these journals. A substantial number of the datasets in the astronomy journals are structured and formatted tables that are readable by computers, as compared to structured data associated with figures. The journals published 63 datasets associated with figures in 2013. It is exciting to note that as of June 2014, the amount of data associated with figures has increased dramatically, and the number of those datasets anticipated in 2014 is double the number received in 2013. This represents a significant level of adoption in the community as well as in the editorial process. In order to manage datasets over time, it is necessary to be familiar with the digital file formats that scientists are using, as well as to understand the metadata that is needed to make data discoverable in research systems. One of the activities of this project was to engage disciplinary experts in discussions on those topics, and several workshops were held to exchange ideas. Workshops were also an opportunity for a diverse group of experts to interact among themselves to strengthen their mutual understandings about data curation and exchange in each discipline. Another highly informative activity we undertook during this project was a survey of physicists about their attitudes about and behaviors with datasets. Following are some of the highlights of this study. Authors who published in AAS journals were more likely than authors who published in the Physics of Plasmas (60% and 15% respectively) to report that they had requested, acquired, or worked with datasets from other researchers over the last two years. Based on comments written by respondents, this disparity is due to a greater familiarity with the practice of data sharing in the astronomy community rather than a bias against data sharing among authors of Physics of Plasmas. Indeed, many authors in Physics of Plasmas were enthusiastic about the value and importance of data sharing, although apparently they never had the opportunity to request datasets from other researchers. More than two-thirds of the authors who published in AAS journals reported that they probably or definitely planned to request datasets from other researchers over the next few years. Only one-third of the authors from Physics of Plasmas were reasonably certain that they would use datasets from other researchers, however, quite a few Physics of Plasmas authors (44%) reported that it was possible that they might use datasets from other researchers. Virtually none of the authors reported that they would definitely not participate in data sharing. A strong majority of authors (more than three-quarters) agreed with statements about the usefulness of linking datasets to articles for their own research and for scientific progress in general. In that context, it is interesting to note what the researchers did with the data they obtained from others. One of the goals of data sharing is to advance knowledge, and the two most common uses of other researchers’ data are certainly in line with that goal. The vast majority (85%) reported that they used the datasets to explore new research questions, a very satisfying result. This is exactly the way science is supposed to work.