The prevailing model for digital preservation is that archives should be similar to a fortress: a large, protective infrastructure to protect a relatively small collection of data from attack by external forces. If data objects were not tethered to repositories, can we create objects that preserve themselves more effectively than repositories or web infrastructure can?

Intellectual Merit

This project will advance the science and engineering of digital preservation in several areas: complex digital object metadata formats, web object design, P2P algorithms, automatic format migration and object repository interaction. Some areas, such as the format migration services and migration preference language will be new contributions. Other areas will be a matter of novel adaptation, integration and deployment. Technical highlights of this proposal include:

i) Adopting flocking rules for data objects in digital libraries (DLs), thus allowing complex emergent behavior of the archived data objects from a small number of simple, easily implemented rules. ii) Making data objects responsible for the conversion and translation of their own holdings according to preferences that were specifed at creation time. The data objects will rely on transient, network-accessible services that are not attached to repositories. iii) Creating a preservation testbed with real content in self-preserving digital objects, repositories, and in the general web.

Broader Impacts

In many ways, creating and integrating the technology necessary for self-preservation will be straight-forward; raising peoples' expectations from passive digital objects to self-preserving digital objects will be the biggest challenge. For this reason, a series of longitudinal experiments about preservation in general as well as self-preserving digital objects will be integrated across the four-year undergraduate computer science curriculum at Old Dominion University. The complimentary relationship between preservation and software engineering will be stressed. The content will be of interest to students, thereby avoiding listless, perfunctory participation. One example will include preserving copies of old tests, assignments and projects content the students will be highly motivated to preserve. This kind of preservation (digital or nondigital) is already performed to some extent by fraternities and other memory organizations. This process will be codified and democratized with the appropriate information technology tools. The experiences reported by the students will be incorporated into future versions of the preservation software.

Project Report

Open Archives Initiative - Object Reused and Exchange (ORE): Perhaps the most important and general of this project is the participatio in the ORE standardization effort. While ORE is not a preservation effort per se, we see it is a necessary precondition to preservation: you must have a machine readable list of web resources that are being preserved before you can talk about "copying" the resource to a different location. The contribution of ORE is providing a machine-readable "splash page", know as the Resource Map (ReM). ReMs describe "aggregations", which are the conceptual objects that form a collection of web resources on the web. As shown in the figure, not all resources that go into a splash page are of interest: logos, navigational links and the like are typically out of scope when we talk about "refreshing" or "migrating" the content. Similarly, some resources are included in the aggregation but not directly linked from the splash page (e.g., the latex source is one level down from the splash page, but it is an integral part of the aggregation). More about ORE can be found at: www.openarchives.org/ore/ Self-Preserving Digital Objects: We built extensive simulations of self-preserving digital objects and established the various algorithms for how the graph could grow and repair itself after attack without input from a repository administrator or other "master". Implementing the capabilities in the simulation into the current HTTP environment, presents a significant challenge. Inspired by crowdsourcing and social media, it is our intention to make preservation a social activity. We are finishing the creation of the "Preserve Me!" suite of Javascript and other services that allow people to instrument their splash pages with a Web 2.0-style button (see the image). When this button is clicked by interested users, it brings the preservation component of the web resource to life, performing maintenance and messaging activities. The idea is that the object will enlist the user to assist in the preservation process, especially considering things like making copies at new sites (e.g., clicking on approvals, solving captchas). The status of the digital object is encoded in the ORE ReM, including an ontology of link relations and variety of common messages that the objects will send to each other to inform them the network of objects about the arrival of new servers (where they can copy their content), the life or death of objects, and change in their own metadata status (e.g., refreshing and migrating aggregated resources). We are putting the finishing touches on the Preserve Me! library and the results will be announced at http://ws-dl.blogspot.com/ when they are ready for broad testing. HTTP Mailbox: One of the critical pieces of infrastructure we had to implement to transition from simulation to implementation is the HTTP Mailbox. We investigated a range of communication options, including Faye, Twitter, Statusnet, and others but they were not suitable for a range of reasons. We wanted to adopt Representational State Transfer (REST) approaches for our objects to communicate amongst themselves, but we continually ran into problems tunneling HTTP traffic through other services (e.g., Twitter). We eventually came up with a "mailbox" approach, where rather than having object1 send a message to object2 (for a variety of engineering reasons, neither object1 nor object2 are completely RESTful), object1 will (with the assistance of an interactive user) send a message to object2 via the mailbox. Then when object2 is woken up by a user visiting it and hitting the "Preserve Me!" button, it will check its mailbox and take action on what other objects have communicated to it (again, possibly enlisting the help of the interactive user). Although a central part of our design, we believe this approach for HTTP messaging will be of interest to other developers as well. The code can be downloaded from: https://github.com/ibnesayeed/HTTPMailbox Students No Longer Use Departmental Web Services: When we first began this project, we thought we could monitor the kinds of objects (slides, pdf, software, etc.) that students would store on the shared departmental web server (e.g., the typical "www.example.edu/~user" URIs) and that we could observe the preservation activities of students as they copied their content to different places (with the premise that students have an implicit preservation plan, even if they have not thought about it in those terms). From the time when we proposed the project to the time when we performed it, the world changed. We found that less than 3% of the CS students (ugrad and grad) were using the deparmental web servers, although most had profiles on various public services (e.g., slideshare, Twitter) with no overlap in content (i.e., no active preservation plan in place). We developed a program that finds and merges these profiles; the description can be found at: http://ws-dl.blogspot.com/2011/08/2011-08-24-kdd-2011-trip-report.html

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0643784
Program Officer
Vijayalakshmi Atluri
Project Start
Project End
Budget Start
2007-04-15
Budget End
2013-03-31
Support Year
Fiscal Year
2006
Total Cost
$540,753
Indirect Cost
Name
Old Dominion University Research Foundation
Department
Type
DUNS #
City
Norfolk
State
VA
Country
United States
Zip Code
23508