The integrated Rule Oriented Data System (iRODS) software is used in production systems on an international scale, supporting interdisciplinary research projects in seismology, oceanography, astronomy, plant biology, climate change, cognitive science, social sciences, psycholinguistics, and high-energy physics. The iRODS data grid supports collections at scale, from small 200 Gigabyte social science collections, to multi-petabyte collections of observational data.

This will fund the development of the iRODS infrastructure to incorporate new capabilities to meet the requirements by each user community, to support interoperability with domain specific technologies such as support for an extended metadata system, advanced data transport protocols, unification of authentication environments, user-selected access mechanisms, data analytic micro-services, and real-time data streams. This will also fund continued consulting support for installation and customization of the software, incorporation of new features developed by an international group of collaborators, and presentation of tutorials and workshops on applications of iRODS. We will develop rulekits that encapsulate standard policy and procedure sets to simplify use by new communities. We will collaborate with the University of Wisconsin on security appraisals to minimize vulnerabilities, and collaborate with the Renaissance Computing Institute on integration with cloud computing systems. We will explore research initiatives related to database federation and workflow integration to meet specific requirements of the user communities.

We will develop the mechanisms (improved user interfaces, documentation, and rulekits) that will enable use of the iRODS data grid at scale with thousands to millions of users. We will support the application of the iRODS infrastructure to NSF research initiatives with the explicit goal of enabling use of shared collections within institutional repositories for education initiatives. Since the policies used within iRODS to control collections can be tuned to meet specific community goals, iRODS can be used to integrate institutional repositories with NSF national research initiatives. iRODS can also link personal laptops into national collaborations, enabling policy-controlled participation by students in research initiatives. Through creation of standard rulekits, we will enable creation of reference collections within research projects, within institutional repositories, within national research projects, and within international collaborations.

Project Report

Scientific research is increasingly becoming collaborative through the wide area network. With people pooling data, methods, computation and skills it is very important to develop a sustainable and extensible model for a cyber-infrastructure platform that can be easily deployed and that can work across multiple disciplines to provide a uniform interface enabling long-term and short-term research collaborations. The integrated Rule Oriented Data System (iRODS) software, developed and deployed under this NSF funded project is a model collaboration platform that is being used by large-scale scientific and bio-medical communities. Research collaboration infrastructure relies on two important concepts: virtualization and federation. Virtualization provides a means of hiding differences in technologies and peculiarities of system requirements by exposing a uniform interface to the applications and users. For example, two autonomous systems can differ in the way they authenticate a user for access to their systems. But middleware can hide these differences and provide a uniform way to authenticate a user while acting as a proxy user to these two systems. Virtualization of the management of a set of resources for a community of users gives rise to self-contained Virtual Organizations (VO). Federation mediates interactions across two or more VOs and manages the trust relationship between autonomous communities of users. We explored a particular type of virtual organization (VO), called the policy-oriented data management system and which led to the development of the iRODS software. In iRODS, namespaces for users, resources, metadata, methods, and micro-services are virtualized and form a means of abstracting technological differences in the physical implementation of the system. The user is exposed to a virtual framework which remains unchanged even as technology extension, versioning, obsolescence and refresh is happening under the covers. The stability provided by such a system makes it easier for scientists to concentrate on Science rather than dealing with the ever changing technology. We also explored models of federation across VOs – across homogenous systems and heterogeneous systems. Two models of federation – strong federation and weak federation - were established and developed as part of the iRODS software infrastructure. The two models provide deployment at different levels of trusted interactions between two autonomously managed resources and VOs. Multiple versions of the iRODS system software were successfully released by the project team. Sound software engineering practices were adopted in developing iRODS including repeated in-depth vulnerability assessments, resulting in a highly secure community software. More than a thousand projects all around the world are using iRODS for implementing their virtual data organizations. These include major NSF-funded projects such as the iPlant Collaborative Data Store, the XSEDE Data Replication Service, the Southern California Earthquake Simulation data management, PetaShare Data Network, the GENI project and the Data Federation Consortium Platform. Other projects that use iRODS range from federal agencies such as NASA, NOAO, and NOAA data grids, and commercial enterprises such as the Sanger Institute, the Broad Institute and Data Direct Networks. Other regional, institutional and national projects include the SILS Life Time Library, the Texas Digital Library, CineGrid, Chronopolis, French National Library, Taiwan National Archive and the Australian Research Collaboration Service. The iRODS system developed with NSF funding, enjoys a large user base and plays a critical role in national-scale projects and enterprises. Hence, transitioning the software into a sustainable environment was critical so that commitments made to these projects will be kept and the funding put forth by NSF in promoting this effort will continue to have a major impact into the future. To this effect, we formed a software consortium with the goal of maintaining the software, sustaining development as needed, promoting new applications and helping the user community beyond the NSF funding phase. At this time, the iRODS Consortium has been successfully launched with membership from commercial enterprises and community stakeholders and growing with a promising and active user base.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1032732
Program Officer
Amy Walton
Project Start
Project End
Budget Start
2010-10-01
Budget End
2014-09-30
Support Year
Fiscal Year
2010
Total Cost
$1,635,757
Indirect Cost
Name
University of North Carolina Chapel Hill
Department
Type
DUNS #
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599