Future microprocessors will consist of billions of nanoscale transistors organized as multi-core multithreaded microarchitectures. Since nano-sized transistors are sensitive to external events and manufacturing variabilities, there is a non-negligible probability of one or more faults occurring in one of billions of transistors and affecting one of many threads. High data-integrity and availability requirements make reliability as important for computers as performance, power consumption, and yield. This project studies techniques for characterizing and mathematically modeling the vulnerability of system-level components (i.e. at the microarchitecture, OS and program levels) to soft-errors.
Today's design methodologies optimize the performance and power of multi-core and/or multithreaded architectures, but largely ignore reliability in the presence of soft errors. An important and urgent research task is to develop frameworks, models and techniques to characterize and estimate the deleterious impact of soft errors. This research addresses the above challenge by 1) developing a unified, reliability-aware simulation framework to quantify microarchitecture soft-error vulnerability of simultaneous-multithreading and multi-core systems consisting of a wide range of heterogeneous hardware and software components; and 2) creating fast and accurate analytical models to estimate and forecast soft-error vulnerabilities of hardware and software components without using lengthy and detailed simulations;
Frameworks that can quantitatively study soft error vulnerability will enable reliability-aware designs and research for emerging simultaneous-multithreading and multi-core architectures. The PIs will use the concepts, tools, techniques and other results of this research project to introduce graduate and undergraduate students to the nature of soft errors and their impact on execution environments. These teaching activities will lead to improvements in courses on computer architecture, fault-tolerant computing and nanocomputing. The tools developed in this project are accessible and usable over the Web, using equipment and middleware developed by the PIs laboratory. This makes it straightforward for other academics and engineers to use them in their own work.