The focus of the research is the design and evaluation of hardware assets and software techniques to improve the performance of the memory hierarchy of high-performance systems. Processor speed has steadily improved during the last decade but memory latency and bandwith have been progressing at a slower pace. Several techniques to reduce or tolerate high memory latencies have recently surfaced. Some of these techniques are evaluated using trace-driven and instruction-level simulations of benchmarks. A major thrust of the effort is to devise improvements in the simulation techniques, e.g., by using selective sampling and parellel discrete event simulation, so that performance results can be obtained faster. The experiments in parallel simulation serve as a test bed to customize programs for parallel execution. Additional projects continuing present reserch include the design of cache coherence protocols with various block sizes for transfer and coherence. These protocols are geared towards a reduction in contention in the interconnect of shared-memory multiprocessors and a decrease in the amount of traffic caused by sharing. Similar goals are the subject of software techniques such as compiler directives for bypassing the cache and placement of data in the right levels of the memory hierchy.