As large-scale, distributed systems are used in safety-critical applications, the fault tolerance of these systems becomes a crucial issue. This research focuses on the fundamental problems of diagnosis and fault-tolerant routing. This work will, for the first time, consider the coordination of these important tasks, by making use of diagnostic information to increase routing efficiency and system throughput. The work will target systems possessing a relatively sparse interconnection structure such as grids, meshes, and hypercubes which constitute a large fraction of existing systems and are expected to dominate future applications. The proposed research will involve the integrated design of algorithms for diagnosis and routing, along with evaluations of algorithm performance that are both theoretical and experimental in nature. The theoretical work will utilize probabilistic analysis techniques in order to evaluate performance under recently proposed probabilistic models as well as to examine performance limits for these systems. The experimental work will involve the implementation and testing of the proposed algorithms on 32-node Intel IPSC/2 hypercube. A product of this research should be efficient algorithms for these fundamental problems in distributed fault tolerance that can be used in a wide range of multiprocessor/multicomputer systems.