As CMOS scaling continues, increasingly smaller feature sizes and increasing power densities are accelerating the onset of wear-out or aging-related hard failures in processors. Current lower-level solution strategies will likely be inadequate to address this lifetime reliability problem. This research advocates higher-level, microarchitectural solutions for processor lifetime reliability. The first component of this work is in the development and validation of microarchitecture-level models, metrics, and tools that incorporate key failure mechanisms and their scaling behavior. The second component develops novel architectural solutions for the lifetime reliability problem, including dynamic reliability management and selective structural redundancy.
The performance benefits from CMOS technology scaling over the last several decades have enabled the information revolution that has affected virtually every aspect of society. The problem of lifetime reliability addressed in this proposal is one of the key impediments to seeing continued benefits from CMOS scaling. The proposed work seeks to develop a fundamentally new approach to address this problem that will enable meeting the reliability goals critical for all processor manufacturers. This work is in collaboration with researchers from IBM which will provide needed industrial expertise as well as a path for technology transfer.