Long-term preservation of digital data is critical in the sciences and especially so in the ocean sciences where the cost of data acquisition is very high, typically $750K per project for ship time alone, and re-acquiring datasets is generally not feasible. Instrumentation, media and formats are rapidly evolving, more diverse types of sensors are being used, and the volume of data collected is increasing exponentially. While there have been dramatic advances in storage technology, raw storage capacity alone will not solve the long-term data retrieval and preservation problems. Years after data were collected, researchers seeking to re-use the data struggle to discover information about the context of original observations. There is a critical lack of metadata infrastructure, and substantial barriers often exist between individual projects, diverse computer systems, and different institutions. Because the requirements for long-term preservation of data and those for independent re-use of data have a great deal in common, they can both be met by a common approach, the publication of information in a digital library designed for handling scientific data. This award will establish a multi-institution, scalable digital archiving testbed, combining the efforts of the San Diego Supercomputer Center (SDSC), the Scripps Institution of Oceanography (SIO), and the Woods Hole Oceanographic Institution (WHOI). This inter-institutional demonstration project, will implement a fully functioning digital library, providing a spectrum of curatorial functions including automated ingest, metadata extraction, provenance tracking, validation, quality control, and access control. The combined SIO/WHOI digital library will allow data from approximately 60 new oceanogrphic research projects to be preserved each year, augmenting data from roughly 1600 awards to dozens of institutions over the last 4 decades, over a wide range of disciplines. With the introduction of WHOI data, the testbed will provide access to data, photographs, video images and documents from WHOI ships, Alvin submersible and Jason ROV dives, and deep-towed vehicle surveys. Notions of scalability will be tested, as data volumes range from 3 CDs per cruise to 200 DVDs per cruise. An interactive digital library interface will allow combinations of distributed collections to be browsed, metadata inspected, and objects displayed or selected for download. In addition, web services will be implemented to insure computer-to-computer interoperability, thus helping to streamline data interchange with the NOAA's National Data Centers, and with other major research consortia.

Broader impact

Data and technology from this effort will be incorporated in the second national teachers' workshop for the grantees' ERESE (Enduring Resources for Earth Science Education) Project, scheduled for summer, 2005 at SIO. These efforts will result in direct access for 3000 students in the classrooms of participating teachers, and a wider audience across the Internet as a component of NSF's National Science Digital Library. Seafloor photos and video from Alvin have enthralled schoolchildren around the world for generations. This effort will make these images much more readily accessible to students, teachers and the general public.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0455651
Program Officer
Lawrence Brandt
Project Start
Project End
Budget Start
2005-06-01
Budget End
2007-12-31
Support Year
Fiscal Year
2004
Total Cost
$150,033
Indirect Cost
Name
Woods Hole Oceanographic Institution
Department
Type
DUNS #
City
Woods Hole
State
MA
Country
United States
Zip Code
02543