The goal of this project is to improve the design of systems which are fault-tolerant. Systems which have a self-monitoring capability will be studied. This capability is very desirable and is now possible in modern systems composed of large numbers of networked computing elements. Our work hinges on the following assumption: some of the elements of the network are able to evaluate some of the other elements of the network to determine if they are functioning properly. After performing these evaluations the information obtained can be exchanged among elements and used to localize the faulty elements. This method harnesses the large computing power of the system itself to achieve fault-tolerance. Current understanding of the design, modeling and analysis of these systems is inadequate and this will be addressed by developing comprehensive probabilistic models, efficient algorithms to process evaluation data, and new techniques to analyze the global fault- tolerance achieved by these systems.