To produce the dramatic increases in computer system performance needed to solve the "grand challenge" computational problems, it will be necessary to incorporate more parallel processing technology into new computer designs. A major impediment to achieving high performance in a multiprocessor, however, is the delay encountered when accessing the data memory. Current techniques for reducing the memory delay, such as data prefetching, vector memory fetches, and private data caches, have not been as effective in multiprocessors as in traditional uniprocessors because they have failed to use all of the available information. In particular, these techniques typically rely only on the information available at run-time, which is insufficient in the complex environment of a multiprocessor system. The primary objective of this research project is to develop and analyze new techniques for improving multiprocessor memory performance by integrating hardware and software strategies to utilize all of the memory referencing information. The project will explore both semantic information at compile-time and dynamic information at run-time to improve data prefetching, processor scheduling, data placement, and cache coherence enforcement. The new techniques generated through this research will be validated using a unique combination of trace-driven simulations, mathematical models, software prototypes running on actual parallel machines, and implementation of new algorithms in a parallelizing compiler.