This research will investigate algorithms to build reliable shared memory systems on distributed memory architectures. Reliability is crucial when the number of components increases in such architectures and the large amount of work performed by a typical computation can ill-afford to be lost in a failure. The research is divided into two stages. The first stage is design of a collection of such algorithms. The second stage is to evaluate the algorithms using simulations, analytic models, and actual implementations. The collection of algorithms will address the tradeoff between recovery cost and the overhead during normal execution. They also distinguish between two different models on shared memory: one in which the entire process address space is shared, and the other in which application-defined segments of memory are shared. Performance evaluation is a crucial component of this research, since recovery and overhead cost are closely related to the pattern of communication in an actual shared memory system. Address traces from real application programs are used to simulate performance of the reliability algorithms. Analytic models are used to extend the results beyond these applications. Finally, implementations of algorithms provide actual performance figures and verification of the analysis and simulation. The algorithms borrow techniques from reliability research on message passing systems. The availability of shared memory applications, however, allows an estimation of the overhead of reliability algorithms and a realistic comparison of these algorithms.

Project Start
Project End
Budget Start
1991-05-15
Budget End
1994-04-30
Support Year
Fiscal Year
1991
Total Cost
$126,982
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820