If properly realized, the data deluge will be a catalyst for new scientific discovery that fuels advances in grand challenge questions such as climate and social-ecological interactions. While federal agencies such as the National Science Foundation have invested very successfully in repositories, infrastructure, and tools for data-intensive science, investing in data solutions is not the same as investing in high performance computing resources because unlike general purpose compute facilities where the facility can be separated from the use, it is difficult to separate data from its semantics, so general purpose solutions only address part of the problem, and a small part at that. Recognizing the opportunities that could be realized through stronger integrated efforts, NSF is encouraging a path towards coordinated efforts that result in satisfying the needs of a broader constituency that strives for interoperability, harmonization of concepts, protocols, and standards nationally and internationally. This proposal is a small but fundamental next step towards building an organization with lasting and significant impact on the broader community engaged in 4th paradigm research and education.

Project Report

The grant provided by the National Science Foundation (NSF) funded a workshop on the topic of research data sharing. The objectives of the workshop were to identify concrete technical activities and events that would advance the science and data science communities around data in a pragmatic and practical way, engage graduate students, and advance thinking and planning around a U.S.-based Data Consortium. The workshop was held in Arlington, Virginia on September 30-October 3, 2012. The 121 participants included national and international particpation. Representation included the NSF funded DataNet and INTEROP projects and elsewhere in the US, with 28% of the participants from outside the U.S., including Europe, Australia, and South Africa. Participants included experts from the EUDAT project in Europe, experts in data sicence, data infrastructure, and data generation. The intellectual merit of the workshop is to serve as a venue for exchange of information around ways to advance international scientific data sharing. The three-day workshop was divided into two parts: the first 1.5 days focused on topics of broad global interest and the second 1.5 days focused on discussing the organizational structure of the consortium, which later became the Research Data Alliance (RDA). The major outcome of the workshop was that the RDA is now in existence. Several criteria emerged during the workshop as being key to a lasting consortium of data researchers. First, the focus should be on data that advances scientific and scholarly research and innovation. Second, key effort is not going to be accomplished only during workshops; other activities such as topic driven technical advancements, and hackathons are needed. Finally, a consortium should encourage and stimulate volunteer community involvement through workshops or meetings and would likely begin with a handful of passionate people, but membership must be open to everyone. Additionally, we must engage more than simply today’s thought-leaders. Young researcher exchanges between institutions should be supported. The broader impact of the workshop was its direct and relevant contribution to the formation of the Research Data Alliance (http://rd-alliance.org) (RDA). The workshop galvanized the grassroots community of researchers in the United States. This, combined with RDA’s impact- and adoption-oriented governance for advancing data sharing internationally offers the potential to transform data sharing internationally over the next 5 years. Our original proposal when hosting this workshop was to engage thought leaders, practitioners, and young researchers to advance scientific data management, preservation, and sharing through a long-term, selfsustaining, grass-roots organization. This vision has been realized on a much larger scale with the Research Data Alliance than the organizers had hoped.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1238168
Program Officer
Robert Chadduck
Project Start
Project End
Budget Start
2012-06-01
Budget End
2013-05-31
Support Year
Fiscal Year
2012
Total Cost
$98,204
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401