The increasing reliance on computers for virtually all applications drives the need for provision of fault- tolerance to prevent disruptions in the delivery of desired services. However, approaches to fault-tolerance are acceptable only if high costs and degradation in the system performance is avoided. The checkpoint based fault recovery schemes currently used are effective, but they necessitate customized solutions and a high cost penalty, both monetarily and in long recovery times. A novel cache based approach that provides for high-performance and low-cost checkpointing based recovery in distributed systems will be investigated. The principles of using caches for providing stable checkpoints will be established, and the architectural concepts will be developed. The research focuses on developing techniques to analyze and control the cache attributes of stability and frequency of checkpoints, and establishing response/overhead characteristics. Protocols for cache based recovery (roll- backward and roll-forward) over varied fault instances will be developed and analyzed. This research will utilize existing system caches in order to provide for automatic checkpoint establishment and for a low-overhead fault recovery approach which is explicitly transparent in use to the user/OS; thus a viable and effective fault tolerance scheme for general computing systems will be developed.