Every field of science generates and utilizes data in various forms: programs, instrument outputs, papers, notes, applications, simulations, video and audio recordings, etc. The continuing and evolving challenge for scientists is how to store, access, transform, manage and curate the variety of data required to effectively conduct their research, and transparently share it with other researchers across campus or at other institutions. The MI-OSiRIS project is addressing that challenge by combining an object-based software-defined storage technology with a monitored, managed network infrastructure to give scientists a distributed storage system which allows them to directly access their data from resources at any of the participating institutions. Furthermore, MI-OSiRIS utilizes each institution's existing authentication infrastructure to allow scientists to provide controlled access to their data across all participating institutions. By documenting and publishing designs, code, and operational experiences, the MI-OSiRIS project serves as a replicable model for supporting data-intensive, multi-institutional science collaborations.
MI-OSiRIS implements a Ceph-based petabyte-scale distributed data system by deploying object storage servers at each participating institution, connecting them via a managed high speed network, and distributing data based on the specific requirements of each science research domain. Ceph, an open source storage platform, supports multiple data access methods (traditional file, native object, and block), and allows configuration of access, replication, distribution, and integrity on a per-research-domain basis. MI-OSiRIS is built on low-cost, commodity hardware and can deliver gigabytes per second of I/O bandwidth per node. The system monitors and manages the network paths between its partner institutions, science users and Ceph storage components by strategically deploying perfSONAR instances which have been augmented with a network discovery, monitoring, and management platform (Network Management Abstraction Layer). Globus Online servers provide access to data from outside MI-OSiRIS. In addition, MI-OSiRIS leverages Ceph's software defined storage aspects to automate some data-lifecycle management tasks.