Collaborative research between academic institutions is facilitated through the creation of shared data collections. The de facto standard for distributed data management is the Storage Resource Broker data grid (SRB) developed by the Data Intensive Computing Environment group at the San Diego Supercomputer Center (SDSC). The SRB is used extensively in national and international projects in support of data grids for sharing data, digital libraries for publishing and discovering data, persistent archives for preserving data, and real-time systems for federating sensor data. The SRB is generic cyberinfrastructure (software middleware) that is used in production managing over 2 PBs of data across scientific domains such as astronomy, biology, ecology, high energy physics, medicine, oceanography, seismology, and the humanities.

Intellectual merit The SRB provides critical cyberinfrastructure by abstracting the main objects (files, streams, metadata, resources, users) needed for data sharing, discovery and access. Based on the experience in developing the SRB and feedback from a robust user community, we have found that abstraction of data management policies is also needed. To automate execution of management policies, we are developing the next generation of distributed data management technology called the integrated Rule Oriented Data System (iRODS). The system integrates rule-based data management with server-side workflow technology for executing micro-services (software modules that perform well-defined tasks). The ability to characterize management policies as rules controlling the execution of micro-services makes it possible for the first time to define both user-specific and community-wide policies, automate the execution of the management policies, and validate assertions about repository management.

The concepts behind iRODS provide a platform for developing customizable and verifiable middleware systems. The rule based system combines ideas and techniques from guarded logic programs (forward-chaining and backtracking), databases (triggers and ECA rules), and abstract workflow systems. Procedural and declarative semantics control modular software for not only customization but also program verification. The modularity provided by declarative chaining of micro-services enables rule-based validation of remote operation execution. The innovative application of management virtualization supports automation of policy execution for data grids, digital libraries, persistent repositories and archives, as well as educational environments that are expected to contribute to many of the scientific breakthroughs of the 21st century. The new approach enables planned evolution of all components of cyberinfrastructure that support distributed data management.

This proposal has three aims: to maintain support for the SRB data management software; to develop and support the next generation of rule-based data cyber-infrastructure and to develop a transition framework to migrate to the new data management infrastructure. The aim is to develop evolvable solutions to bring the full power of a national cyberinfrastructure to communities of scientists and engineers.

Broader Impact The SRB software enables inter-disciplinary data sharing as well as discipline specific collaborations. The customization and abstraction features of data management policies in the iRODS software will help both small and large communities share data using uniform cyberinfrastucture.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Application #
0721400
Program Officer
Lucille T. Nowell
Project Start
Project End
Budget Start
2007-09-01
Budget End
2009-04-30
Support Year
Fiscal Year
2007
Total Cost
$2,040,832
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093