Emerging directions in computing are likely to dramatically increase storage demands. Traditional drivers for storage-large databases, scientific data sets and general personal and institutional information needs-continue to increase, while new elements impose even greater, more complex needs. Such new elements include "collected" materials, i.e., digital libraries of textual documents, images, and video, as well as streaming data from widely distributed and widely varied networked sensors integrated with the physical world. With the continued miniaturization of CMOS technology, this last category may increase in scale at a rate similar to Moore's law, rather than being limited by the pace of human creative activities, and it will feed into higher level processing and storage.

In anticipation of these developments, we propose to provide a set of storage services targeted at supporting research projects with profound current and future storage needs. In particular, we propose

1. experimenting with providing low-cost petabyte-scale generic "object storage", via the use of commodity components and tape-less (i.e., disk-to-disk) backup 2. providing a fast disk cache for a scientific cluster computing facility, to support large computations that require significant local storage 3. providing a few thousand environmental sensors to support experimenting with collecting and storing large sensor-derived environmental data sets.

The object storage experiment is central to this project. The cost of providing storage to users today is dominated by administrative costs, and, left to its own devices, the situation will only grow worse. For example, IT support organizations report that the cost of storage hardware accounts for only 10-15% of the fully loaded cost of providing storage, which is dominated by the cost of backups. In addition, the higher-end servers used to dispense bytes reliably are much more expensive than generic low-end servers.

Our goal is to provide a generic storage service that offers attractive "enterprise" qualities, but at a cost that tracks advances in hardware, and which can grow to a potentially much larger scale. We plan to address the issues of availability and durability by storing objects on multiple, inexpensive servers, some geographically distributed, and without tapes or extensive operations personnel. We believe that recent developments in a number of areas, particularly versioning, location-independent routing, and Byzantine Agreement, will make such a solution viable.

Broader Impact: We plan to use the facilities developed through this grant to support projects affiliated with the Center for Information Technology Research in the Interest of Society (CITRIS), a university-industry-state partnership that aims to create and harness information technology to tackle society's most critical needs. Included among these projects are digital library applications, analyzing traffic videos to learn the characteristic signatures of various traffic incidents databases, cluster computing applications (including finding boundaries between objects in natural images, detecting high-energy neutrinos from astrophysical point sources, and autonomous aerial vehicles support problems), the Ivy "Smart Buildings" test environment (a research infrastructure comprising sensor networks of fixed and mobile motes in a number of buildings on and off campus, and experimenting with energy efficient building operation, "demand responsive" electricity management systems, fire safety and disaster preparedness, and structural integrity monitoring), and the Electronic Cultural Atlas Initiative, a global collaboration of nearly 800 affiliates and over 120 project supporting applications that combine global mapping, imagery, and texts, and which provides scholars and other users with digital research resources for the visual presentation of complex combinations of data from multiple disciplines.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0303575
Program Officer
Mohamed G. Gouda
Project Start
Project End
Budget Start
2003-09-15
Budget End
2010-08-31
Support Year
Fiscal Year
2003
Total Cost
$1,800,000
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704