Thisi project will develop novel tools and technologies for automated ingestion and management of preservation processes, and demonstrate their use on scientific, historical, and educational collections covering widely different technical requirements. The technologies will be built on a novel robust, reliable, and secure layered architecture - ADAPT (Approach to Digital Archiving and Preservation Technology) - developed by the research group using open standards and web technologies. This architecture will evolve gracefully as the underlying technologies change and will interoperate with digital library and grid technologies. Specifically, The team will develop a distributed persistent archive that provides producers, site administrators, and preservation managers key functionalities for the long-term access and preservation of digital assets. A novel architecture for a deep archive, with provably high reliability and high resiliency against security attacks and system failures, will also be demonstrated. The tools and technologies will be tested and validated on four very distinct and rich collections: (i) an archive of videotaped oral histories provided by the Survivors of the Shoah Visual History Foundation; (ii) children's books in their original languages (current over 500 books in 30 languages) available through the University of Maryland International Children.s Digital Library (ICDL); (iii) a rich historical collection of photographs, drawings, maps, charts and textual documents available through the National Archives' Electronic Access Project (EAP); and (iv) a wide variety of unique earth science data available through the University of Maryland Global Land Cover Facility (GLCF).
Intellectual Merit. A large portion of the scientific, business, cultural, and government digital information being created today needs to be maintained and preserved for future use of periods ranging from a few years to decades and sometimes centuries. This project introduces a novel framework for planning, managing, and executing ingestion and preservation processes. Novel features include: (i) an automated distributed ingestion architecture that enables secure and verifiable ingestion of digital objects; (ii) a policy driven management of preservation processes with the ability to constantly audit secure replication, refreshing, and migration, and track media degradation, file corruption, and format obsolescence; (iii) a peer-to-peer architecture for a deep archive, which guarantees with high probability the integrity and survivability of every object in the deep archive against failures and malicious corruption; and (iv) an evaluation strategy to assess various strategies developed under this project.
Broader Impact. Many communities are interested in long term preservation of their data, and are seeking technology approaches to deal with this challenging problem. This project will address issues of direct interest to all these communities. The research team will introduce technologies and tools to these communities through presentations at major related meetings. Moreover, this project involves collaborations with the National Archives and Records Administration (NARA), NASA at Goddard, and the Survivors of the Shoah Visual History Foundation. An impact on these three organizations will in fact have a much broader impact on many other communities. Strong letters of support from these three organizations are appended to this proposal. Various tools will be developed and released as well-documented open source software.