The Data Conservancy, A Digital Research and Curation Virtual Observatory PI: Sayeed Choudhury of Johns Hopkins University

Johns Hopkins University (JHU) will create The Data Conservancy (DC), which will research, design, implement, deploy and sustain data curation infrastructure for cross-disciplinary discovery, with an emphasis on observational data. The Data Conservancy project is led by Sayeed Choudhury, Associate Dean for University Libraries at JHU, which has a strong track record for leading the research library community in new directions. The Data Conservancy is creating a new model in which libraries regard digital data as a special collection that must be maintained and served like their other collections. Participating organizations include Cornell University, University of California Los Angeles (UCLA), the University of Illinois at Urbana-Champaign (UIUC), the National Center for Atmospheric Research (NCAR), University Corporation for Atmospheric Research (UCAR), Marine Biological Laboratory (Encyclopedia of Life), the National Snow and Ice Data Center (NSIDC), Space Telescope Science Institute, Fedora Commons, Portico, University of Queensland, and Tessella Support Services.

Besides defining a new role for research libraries in curating and serving special collections of scientific digital data, expected contributions of the Data Conservancy include * Defining of new educational curricula an other opportunities in data curation; * User centered studies that will help the community and NSF better understand how scientists use, share, and preserve data now and what factors motivate them to or inhibit them from preserving and sharing data.

Through collection, preservation and semantic integration of data that are now very difficult to assemble and analyze together, the project is also expected to have transformative impact on the ability of scientists to answer grand challenge questions that are as important to the nation and the world, such as three that relate to the production of greenhouse gases: * What are the current geographical and temporal distributions of the major pools and fluxes in the global carbon cycle? * What are the control and feedback mechanisms - both anthropogenic and non-anthropogenic - that determine the dynamics of the carbon cycle? * And what are the dynamics of the carbon-climate-human system into the future, and what points of intervention and windows of opportunity exist for human societies to manage this system?

The Data Conservancy begins with data spanning anthropology, applied mathematics, astronomy, atmospheric sciences, chemistry, earth sciences, crop and soil sciences, ornithology, psychology, physics, theoretical and applied mechanics. International partnerships augment this initial collection. Evolutionary development of this collection will be guided by the user centered design process.

Project Report

The Data Conservancy (DC) achieved several accomplishments along the four themes identified within its original proposal: infrastructure research and development, information science and computer science research, broader impacts and sustainability. In addition to numerous publications, posters, book chapters and presentations, notable examples of accomplishments within each of these themes include: Infrastructure Research and Development – DC software including data archive, packaging tool and reference user interface. Successful use of packaging tool within Johns Hopkins University (JHU), the DataNet SEAD project and Site-Based Data Curation project (Institute of Museum and Library Services funded project focused on curation of Yellowstone National Park data) Information Science/Computer Science – Multiple ethnographic studies ranging from deep exploration of Sloan Digital Sky Survey to broad-based examination of data practices across disciplines Development of Data Practices and Curation Vocabulary (DPCVocab) Initial research into concept of observation as unifying theme for ontology development and a conceptual framework for linking meta-analyses New partnerships between domain scientists and data scientists that persist beyond the DataNet grant Development of a protocol for verifying archival information of remotely held data Broader Impacts – Numerous graduate students and post-docs directly supported by DC Development of new courses and curriculum at Illinois and UCLA Summer institutes and workshops, including a workshop specifically targeted to underrepresented groups at NCAR Training modules used by the Federation of Earth Science Information Partners (ESIP) Sustainability – Development of the JHU Management Services (JHUDMS) which operates independently of DataNet funding and provides data management consultation and archiving services Total cost of ownership study with the JHU Carey Business School Capstone projects with Carey Business School that supported development of JHUDMS Ongoing development of DC software without DataNet funding Branding effort with Maryland Institute College of Art that led to DC core message of community building The Data Conservancy began with an ambitious agenda to work across disciplines spanning physical sciences, life sciences and the social sciences. Additionally, JHU Sheridan Libraries has an established data curation program in the humanities. Historical infrastructure development programs have demonstrated the need for this type of comprehensive, holistic strategy. Furthermore, previous infrastructure development programs have also benefited from a longer-term view with an emphasis on social dimensions in addition to technological ones. In many ways, the original DataNet solicitation acknowledged these well established points. The National Science Foundation attempted to introduce project management processes into the DataNet program at some point subsequent to the development of the original solicitation. It is worth noting that project management is fundamentally focused on risk management and arguably the biggest risk to any project or program is scope creep. The scope creep encountered within the DataNet program led to continuous adjustments within the Data Conservancy. The original Data Conservancy proposal focused on research about community requirements that would inform infrastructure design and curation strategy through an iterative process of prototyping. NSF subsequently introduced management structures and requirements that were misaligned from the original goals of the DataNet solicitation and program. Furthermore, the often-changing management processes and requirements consumed a disproportionate amount of resources away from achieving project goals. Despite these issues and NSF’s decision to reduce its funding, the Data Conservancy accomplished a remarkable amount of work along the four dimensions of infrastructure research and development, information science research, broader impacts and sustainability. While the Data Conservancy team feels proud of its accomplishments, it is important to state that there was much more potential that remains partially or even wholly unaddressed.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Cooperative Agreement (Coop)
Application #
0830976
Program Officer
Robert Chadduck
Project Start
Project End
Budget Start
2009-08-01
Budget End
2014-07-31
Support Year
Fiscal Year
2008
Total Cost
$10,085,120
Indirect Cost
Name
Johns Hopkins University
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21218