3D stacked integration of CPU, GPU and DRAM dies vertically interconnected by TSVs (Through-Silicon Vias) is emerging as a key enabling technology for parallel and scalable computing systems of tomorrow. Such 3D Heterogeneous Processor (3DHP) is expected to deliver much higher bandwidth, lower latency and power consumption to break the power and bandwidth walls. Despite such significant benefits, 3DHP comes with new domain-specific challenges that have never been fully explored and addressed. Significantly higher power density, thinned substrate and low thermal conductivity of inter-layer dielectric material all make thermal management a serious problem that threatens overall reliability and performance of 3DHP. This project aims to address this thermal-integrity issue of 3DHP through a holistic cross-layer approach.
Three major thermal integrity issues at respective target system layers including physical, architecture and runtime layers and their correlations will be extensively investigated by a team of three PIs with necessary background and expertise. The proposed novel cross-layer approach includes: 1) Self-calibrated on-chip temperature/stress co-sensor framework at physical layer, 2) Adaptive Error Detection & Correction (EDAC) and DRAM refresh engine at architecture layer for reliable storage and transfer of data among CPU, GPU and DRAM dies, and 3) Dynamic Thermal Reliability Management (DTRM) framework for fine-grained control of interaction between workloads and HW resources at runtime layer. The proposed layered techniques will be tightly interwoven to bring out the most synergistic results. The research in this project will result in a solid thermal-integrity design and simulation framework for viable 3DHP-based parallel and scalable computing systems.