The goal of this project is to develop flexible and efficient Runtime and Compiler System (RCS) technologies to cost-effectively detect and recover from hardware faults in upcoming multicore chips. Semiconductor variations, temperature hot spots, soft errors and aging will make hardware reliability one of the central concerns in the design of multicore processors. RCS technologies will make it possible to meet this challenge because of their flexibility, low cost and ability to target errors that affect program outcome.

Two important objectives of this project are: (1) to avoid full instruction replication within or across threads this is key to acceptance in the energy- and cost-conscious commodity markets and (2) to provide knobs to select the desired performance vs. error-coverage tradeoff.

A prototype, SoftCheck, will be implemented for evaluation purposes. A wide range of novel, cost-effective fault detection and correction techniques will be designed and implemented in SoftCheck. The fault-detection techniques will include: (i) exhaustive self-checking, (ii) partial self-checking, (iii) partial cross-thread checking in a multicore environment, and (iv) other cross-cutting, often multiprocessor-related, approaches. The fault-correction techniques include: (i) disabling clusters in a core (ii) disabling complete cores, and (iii) dynamic recompilation to use other hardware.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0615273
Program Officer
Mohamed G. Gouda
Project Start
Project End
Budget Start
2006-08-01
Budget End
2009-07-31
Support Year
Fiscal Year
2006
Total Cost
$109,926
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820