As feature sizes in microprocessor devices continue to scale down, the devices' susceptibility to hardware faults is increasing. While resilient system design is less cost-critical in high-end business applications where redundant hardware is an acceptable solution, it is expected that inexpensive consumer electronics will largely need to be fault tolerant as well. Approaches that double or triple system costs by replicating hardware are likely too expensive for consumer electronics, such as phones, tablets, and laptops. This project will investigate a new way of modeling and supporting hardware fault tolerance that relies on software and hardware working together cooperatively. The key idea is the design of a System Vulnerability Model (SVM), which approximates the vulnerability of an application to a hardware error. Using the model, each layer of the system offers knobs which can be tuned to control overall system vulnerability. This allows the design of systems that inexpensively provide the right amount of fault tolerance for a given application.
The results of this project could impact how future computer systems provide fault tolerance. Ultimately, such impact benefits society by helping sustain the demand for reliable computing devices, a significant driver in the economy. In the immediate future, it will support the training of graduate students in this critical area, and it will support enhancement of the undergraduate and graduate curriculum to include topics in fault tolerance at North Carolina State University.