Prior research has examined applications that used shared memory in single site systems and their extension to operate in a distributed environment. The approach was to modify the underlying operating system to support a new facility for memory management called distributed shared memory (DSM). This research will relax assumptions which characterized the system as having a "tight" degree of sharing among a small number of reliable, well-behaved, communicating sites. This assumption may not be appropriate for a large scale, typical, distributed computing environment. Reasonably common failure modes in some workstation environments include site failures and network partitions. Prior work does not specifically address the issue of communication breakages nor site failures. This project addresses a manageable subset of these issues. Three issues are of concern for creating a reliable DSM system. The first issue is to determine how sensible it is for DSM to accommodate reliability extensions and for which applications this support is appropriate. The second issue is to design and implement changes to support reliability. It is only through the design, implementation, and use of this facility that one can gain exposure to analyzable failure modes. The third research issue involves a study of the performance of the reliability mechanism using a "fault induction" testing strategy.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
9209405
Program Officer
Anand R. Tripathi
Project Start
Project End
Budget Start
1992-09-15
Budget End
1995-08-31
Support Year
Fiscal Year
1992
Total Cost
$189,908
Indirect Cost
Name
University of California Riverside
Department
Type
DUNS #
City
Riverside
State
CA
Country
United States
Zip Code
92521