The research focuses on hardware-based techniques for fault detection and recovery in multiprocessor systems. Specifically the project examines correctness with respect to the following safety and timing properties of the system: dynamic validation of the correctness of the consistency model, dynamic detection of deadlock and livelock, and autonomic recovery from detected errors (transient or permanent).
Improved computer availability will provide a qualitative benefit to society that increasingly depends on reliable computer systems. The research is motivated by the tremendous economic and human costs resulting from unanticipated downtime or unmonitored malfunctions in safety-critical systems.