Today's explosive data surge puts data storage at the center of the information universe, but data centers are becoming increasingly massive and expensive to operate. Existing systems rely heavily on multiple disk replication to guard against data loss due to device failure, but consuming multiple times more storage space drastically increases the cost for hardware, and especially the cost to operate (building space, power, cooling, and maintenance). The goal of this research project is to develop new erasure coding technology, in lieu of simple replication, to realize a space-efficient, energy-efficient, quantifiably-reliable, and massively-scalable storage systems.
The intellectual merit lies in the development of a feasible approach that relies on the strength of efficient erasure codes distributed across the storage disks and nodes that can be used to build next-generation green data centers, and to transform existing ones. While the benefits of erasure coding in saving storage space is well understood in theory, good and practical erasure codes are hard to design. The PI will build new and optimal erasure codes by exploiting graph theory, finite geometry and combinatorics. Some of these codes have the potential to simultaneously achieve the optimal space efficiency and the minimum complexity promised by the theory. To make erasure-coding efficient and scalable, the PI will also extend the design of single erasure coding to nested/layered coding through hierarchical multiple protection.
In view of the critical importance of energy efficiency, developing green data center technologies will have profound impacts scientifically and to the larger society. The potential application line is broad, including databases, financial information systems, health and medical information systems, research depositories, and digital libraries.
The proposed research also integrates a meaningful education component, including industrial and international collaboration, curriculum enhancement, and promoting undergraduate and graduate research.