In this project, we are investigating the theory, algorithms and applications of fusible data structures for fault-tolerance in parallel or distributed programs with low overhead. Fusible data structures are based on combining partial replication with erasure coding to satisfy three main properties: recovery, space constraint and efficient maintenance. The recovery property ensures that in case of a failure, the fused structure, along with the remaining original data structures, can be used to reconstruct the failed structure. The space constraint ensures that the number of nodes in the fused structures is strictly smaller than the number of nodes in the original structures. Finally, the efficient maintenance property ensures that when any of the original data structures is updated, the fused structure can be updated incrementally using local information about the update and does not need to be entirely recomputed.
The project is carrying out the following three tasks: (i) develop algorithms for fusible data structures for other commonly used structures such as trees and graphs, (ii) develop theory and associated algorithms for multiple faults (iii) apply fusible data structures for recovery of parallel and distributed applications and evaluate the performance benefits of our approach in real applications.