With the growth of cloud computing and the changing manner in which individuals and businesses interact with data, it is increasingly important to manage data efficiently and reliably. The RESAR project tackles the problem of building ever-larger data stores, and offers a novel approach to reducing the energy impact of such increases in scale while allowing easier management and adaptation of the system as it ages. In other words, RESAR offers a means to gracefully adapt a storage system to offer increased reliability or performance as demanded by the systems' age or administrator's requirements. This project develops, studies, and optimizes reliable, energy-efficient storage needed in modern data centers and large-scale data storage environments, and allows such storage systems to gracefully increase its performance and reliability while efficiently scaling to millions of storage devices.
For storage systems to be feasible and manageable at increasing scales, need to be self-healing and self-optimizing, able to adapt to aging and new components whilst dynamically recovering from inevitable component failures. Cloud computing promises savings in staffing as the volume of work in a data center would be distributed over fewer, but better trained staff. While the increasing scale of such data centers offers greater opportunities for energy-saving measures to become more effective, such scales rapidly increase fears of individual components failing. This demands that such large scale storage systems be arranged in such a way as to offer an ability to survive the failure of multiple components, and to do so with minimal management overheads.
To survive the increasingly likely component failures (brought about by the increasing numbers of components in ever-growing data warehouses), storage systems typically employ some form of data replication or redundancy scheme. This strategy not only protects data against loss, but also allows faster access. Unfortunately, doubling or tripling the number of storage devices (or entire data centers) comes at a considerable cost. Alternatively, a site could use erasure correcting codes that provide protection against device failures while only increasing hardware demands by a smaller increment. But such erasure correcting schemes offer limited scalability and can complicate the implementation and self-management of a system considerably. The RESAR approach is to employ novel erasure codes that allow faster layout restructuring, while offering increased scalability, and improved reliability over competing schemes. RESAR allows for restructuring on the fly, and as such, has the added benefit of being complementary to data relocation tasks necessary for routine maintenance and optimization.
Cloud computing and data centers are taking hold as technologies with great promise for cheaper, more flexible, and more energy-efficient information processing. RESAR enables cheaper, more reliable, automated and more easily scaled storage systems. RESAR offers a novel graph representation of a failure tolerance scheme that allows the construction of flexible, dynamically reconfigurable, parity-based redundancy schemes that are well-suited for cloud storage infrastructure. By offering the benefits of more highly-convolved erasure coding schemes, whilst remaining simple and efficient, RESAR offers a new path to self-organizing large-scale storage systems. The resulting systems are more maintainable, easily reconfigured for increasing levels of reliability on-demand, and more cost effective. This efficiency further extends to reduced maintenance and energy demands.