The next generation of high-performance machines will be shared-memory machines with deep memory hierarchies. It is important to execute dense matrix computations efficiently on these platforms, because a considerable portion of the time spent in engineering or scientific simulations is spent in performing dense matrix computations. However, there is as yet little experience in writing portable software or in generating code automatically for machines with deep memory hierarchies. Existing libraries like LAPACK are targeted for uniprocessors with a two-level memory hierarchy, and must be rewritten for machines with deeper memory hierarchies. Existing compiler techniques focus mainly on so-called perfectly nested loops, and are inadequate even for two-level memory hierarchies. The PI's have recently developed a novel approach called data shackling to overcome the limitations of existing techniques. In this project they extend this approach to uniprocessors with a deep memory hierarchy, and to shared-memory multiprocessors.