Recent developments have dramatically increased the importance of ultra-reliable parallel computing. First, parallel processing has moved from a specialized scientific computation tool to an instrument for general-purpose computing. At the same time, computers have begun to control virtually every aspect of human life and are, therefore, increasingly relied upon to provide continuous service. These trends make the development of general but practical techniques for fault-tolerant multiprocessing a necessity. This project focuses on loosely-coupled parallel computers. These systems are ideally suited for ultra-reliable applications due to their built-in hardware redundancy that can be used to achieve fault tolerance. The project has both basic and applied research components. The basic research component studies new models and algorithms for the problems of multiprocessor system fault diagnosis and fault-tolerant routing. The applied research component involves the development of an experimental testbed for fault-tolerant multicomputer systems. A Transputer-based MIMD multicomputer system provides the testbed hardware and low-level operating system. Special-purpose software is developed to provide a system-level fault tolerance framework, a fault simulator, and a data collection tool for the testbed. The testbed allows experimental evaluation of the system-level fault tolerance mechanisms developed in the basic research component and is also made available to other research groups working on multiprocessor system fault tolerance.