Both microprocessor performance and on-chip densities have been growing exponentially for decades. This trend shows no signs of abatement. However, interchip communication costs are not improving nearly as rapidly, so that if present trends continue, computation will increasingly be limited by pin-bandwidth constraints, or by exotic packaging costs. This project is focused on architectures that will address the emerging bandwidth limitations in order to build cost- effective, high-performance systems. In particular, we are investigating (1) future memory hierarchies, exploring alternatives at the levels of physical design, logical organization, and system design; (2) DataScalar architectures, exploiting redundant computations to minimize communication requirements; and (3) Hybrid parallel execution, combining the DataScalar ideas with parallel threads to support multiple computation models simultaneously.