Device physics, manufacturing, and engineering challenges in process scaling are providing signi?cant challenges in producing reliable transistors for future technologies. Many academic experts, industry consortia, and research panels have warned that future generations of silicon technology are likely to be much less reliable with multi-core chips with cores failing in the ?eld due to faults in silicon are around the corner. Concurrently with the reducing reliability, the individual energy ef?ciency of transistors is not keeping up with increase in transistor density. These two trends portend a perfect storm: as the energy ef?ciency of transistors is slowing down, they are becoming highly unpredictable which will force further inef?ciencies. Addressing hardware reliability is a fundamental problem for microprocessors and hence for sustaining the IT revolution. This project looks at mechanisms for allowing chips and the higher levels of software to continue working even when devices fail. The basic idea the project looks at is how to detect when chips fail.

The core idea that this projrct builds upon is the principle of Sampling. Instead of checking for failures all the time, the idea is to use a periodic sampling window for checking for device failures. The project investigates formal models, hardware implementation, and evaluation to understand the effect of device failures and the impact of the detection techniques.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1117782
Program Officer
M. Mimi McClure
Project Start
Project End
Budget Start
2011-07-01
Budget End
2014-06-30
Support Year
Fiscal Year
2011
Total Cost
$199,999
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715