Technology trends and economic factors are driving forces behind the widespread move to mainstream computing and communication systems based entirely on commodity hardware and operating systems. Yet, the premium that we as a society place on the reliability of such systems has increased commensurate with our reliance on them for the smooth operation of our lives. Soft errors resulting from single-event effects (SEEs) are an important-and possibly dominant-failure mode that impact the reliability of such mainstream commodity systems.
This research will develop low-cost SEE-reliability-aware and SEE-reliability-driven design solutions based on optimization to maximize robustness to SEEs, commonly termed SEE-hardening. SEE-hardening is an attractive low-cost solution to increase reliability since it does not require any runtime support from either the hardware or the operating system. SEE-hardening can also be used to complement and reduce the overhead cost of traditional fault detection and tolerance techniques. The optimization algorithms for SEE-hardening resulting from this work provide seamless tradeoffs between SEE-hardness and area-delay-power, enabling cost-effective solutions commensurate with the criticality and reliability requirements over the lifetime of the target application. A major impact of this research is to enable ubiquitous low-cost highly reliable computing, by expanding its reach to domains that lack the financial resources to acquire custom solutions. Through academic and industry collaborations, this project will develop an integrated testbed and web-based resources to facilitate broad research in reliable system design, an area that is rapidly gaining in importance and interest.