This is a continuation of the PI's research on the design of small to medium scale shared-memory multiprocessors where the number of CPU's ranges from a very few to several hundreds. The emphasis is on the management of the memory hierarchy. Innovative architectural features, such as multi-level cache hierarchies and interconnection networks with memory in the switches are studied. An important component of the research is providing qualitative and quantitative evaluations through trace-driven and synthetic simulations. In the case of multi-level cache hierarchies, the emphasis is on the design and evaluation of a two level virtual-real cache hierarchy, the effect of inclusion properties on the protocols for cache coherence and performance in bus-hierarchy based architectures, and on techniques for speeding up trace driven simulation (inclusion properties, trace reduction, parallel simulation). A second area of research is techniques and architectural enhancements to reduce the memory latency in architectures where processors and memory modules are connected by a multi-stage interconnection network. The first part of the research considers caching as a viable alternative and studies improved self-invalidation cache coherence schemes and compares their performance and cost with those of previously published protocols. The second part continues the PI's investigation of the introduction of memory in the switches of the interconnection network.