The most pressing obstacle to achieving high computer performance is that CPUs compute faster than the computer can move data to the CPU. Because programs usually have regular and repetitive accesses, current computers cache recently accessed data in a hierarchy of fast, flat inclusive memories to improve performance. Technology trends are making the data delivery problem much worse and will force changes to to memory design such as partitioned cache levels. We propose a rich software/hardware interface to manage next generation caches. We propose new compiler algorithms to translate programs to perform better on this new hardware, and new hardware designs that enable the compiler to inform the hardware of current and future data access patterns. These new techniques will go well beyond the current minimalist load and store architecture/compiler interface to achieve high performance. The result will enable end users of computer systems to solve their problems more quickly and easily. The research will have broad impact on the design and implementation of software and hardware cache architectures. It will add new and enhanced tools to the compiler, tool, and simulation national research infrastructure. This work will also train graduate students in advanced research.