PI: Reagan W. Moore, University of North Carolina at Chapel Hill
Traditional preservation environments support fundamental preservation activities such as appraisal, accession, arrangement, description, preservation, and access. Digital preservation environments also support information about individual records, management of data encoding formats, migration to new forms of technology, management of massive archives, and validation of trustworthiness assessment criteria. We propose the development of a reference implementation for digital preservation environments based on data grid technology. A reference implementation defines the assessment criteria, the management policies, and the preservation procedures needed for long-term preservation.
The reference implementation will demonstrate the basic mechanisms needed to support long-term preservation of digital records. Based on ten years of experience with applying distributed data management systems to preservation projects, the components of a reference implementation are well understood. The components include mechanisms to automate the management of the preservation environment, automate the validation of assessment criteria for trustworthiness, authenticity, integrity, and chain of custody, and support the incorporation of new technology through extensible policies, extensible procedures, and extensible sets of system information. The goal is to minimize the labor required to manage the massive US record collections (measured in petabytes of data and billions of files), while preserving the essential properties of the records.
The system will be used to inform the National Archives and Records Administration on the approaches that result in viable preservation environments, and to demonstrate to the archival community an approach that successfully manages technology evolution.
Outcomes include presentation and dissemination venues at national conferences, and partner universities: Invited talk: "Big Data Curation," University of Maryland iSchool, Nov. 21, 2013. Invited talk: "Making Data Matter," Duke Statistics class (Dalene Stangl), Fall 2013. Presentation in a graduate class at the iSchool at UNC Chapel Hill by Jefferson Heard. Panel Presentation on "The Future of Digital Humanities," Southern Historical Association (SHA), St. Louis, Nov. 2, 2013. Workshop presentation: "The Human Face of Crowdsourcing: A Citizen-led Crowdsourcing Case Study," at Big Data and the Humanities workshop, IEEE BigData 2013 conference, Santa Clara, Oct. 8, 2013. Publication by: Tobias Blanke, Mark Hedges and Richard Marciano, "Big Humanities Data Workshop at IEEE Big Data 2013," in D-Lib Magazine, Volume 20, Number 1/2, January/February 2014. "Health is a Human Right: Race and Place in America," Sep. 28, 2013 – Jan. 17, 2014, exhibit organized and sponsored by the David J. Sencer CDC Museum, Atlanta, GA (Centers for Disease Control and Prevention). Maps and documents from the CI-BER collection contributed by Richard Marciano. Funding of an NSF DIBBs Implementation grant $10.5 million, with Kenton McHenry (NCSA), PI, and Richard Marciano (UNC), 9/1/13-8/31/18. The project is called: "Brown Dog" and the topic is: "Implementation of software services for data preservation and lifecycle management of heterogenous data, using the CI-BER testbed."