Unreliable failure detection has been gaining acceptance as a fundamental paradigm for fault-tolerant distributed computing. This paradigm was originally introduced to circumvent the impossibility of achieving consensus in asynchronous systems with crash failures. The proposed research aims to extend the applicability of failure detection in multiple ways: (1) It will seek solutions that tolerate both process crashes and link failures, and in particular, solutions that are resilient to network partitioning. (2) It will consider practical problems besides consensus, e.g., atomic commitment and various forms of group membership. (3) It will investigate the extension of failure detection to other models, such as the timed asynchronous model, in order to solve problems whose specifications involve real-time. (4) It will explore the use of randomization techniques to enhance the power of failure detection. This research intends to widen the scope of failure detection and to firmly establish it as a core component of fault-tolerant distributed systems.

Project Start
Project End
Budget Start
1997-09-01
Budget End
2001-08-31
Support Year
Fiscal Year
1997
Total Cost
$230,000
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850