With the increasing availability of message based distributed parallel computing systems, the difficulties associated with constructing parallel algorithms and mapping them onto parallel architectures have grown in importance. Currently, the algorithm development and programming process typically includes the explicit mapping of algorithm subtasks to processors. This problem is typically dealt with manually and often results in inefficient computation and communication load balancing which in turn leads to an overall reduction in the speedups potentially obtainable. The problem is exacerbated when distributed architectures are considered where processors and/or communications links are subject to failure, yet must continue processing at reasonable rates with the remaining computing resources. This requires that process and data checkpointing be done effectively and that, on detection and location of failures, the system be reconfigured and tasks reallocated. This work focuses on the checkpointing and task reallocation problems. This research program is aimed at investigating approaches to allocating tasks to processors in distributed computer systems whose reliability characteristics have been characterized. The goal is to ensure effective use of the remaining distributed resources after failure (processor or link) has occurred through the development of distributed checkpointing and fast task reallocation schemes.