The goal of this project is to improve the design of systems which are fault-tolerant. Systems which have a self-monitoring capability will be studied. This capability is very desirable and is now possible in modern systems composed of large numbers of networked computing elements. Our work hinges on the following assumption: some of the elements of the network are able to evaluate some of the other elements of the network to determine if they are functioning properly. After performing these evaluations the information obtained can be exchanged among elements and used to localize the faulty elements. This method harnesses the large computing power of the system itself to achieve fault-tolerance. Current understanding of the design, modeling and analysis of these systems is inadequate and this will be addressed by developing comprehensive probabilistic models, efficient algorithms to process evaluation data, and new techniques to analyze the global fault- tolerance achieved by these systems.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
8910569
Program Officer
Yechezkel Zalcstein
Project Start
Project End
Budget Start
1989-08-01
Budget End
1992-07-31
Support Year
Fiscal Year
1989
Total Cost
$41,712
Indirect Cost
Name
Johns Hopkins University
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21218