In recent decades, microprocessor performance has been increasing exponentially, due in large part to smaller and faster transistors enabled by improved fabrication technology. While such transistors yield performance enhancements, their lower threshold voltages and tighter noise margins make them less reliable, rendering processors that use them more susceptible to transient faults caused by energetic particles striking the chip. Such faults can corrupt computations, crash computers, and cause heavy economic damages. Indeed, Sun Microsystems, Cypress Semiconductor and Hewlett-Packard have all recently acknowledged massive failures at client sites due to transient faults.

This project addresses several basic scientific questions: How does one build software systems that operate on faulty hardware, yet provide ironclad reliability guarantees? For what fault models can these guarantees be provided? Can one prove that a given implementation does indeed tolerate all faults described by the model? Driven in part by the answers to these scientific questions, this project will produce a trustworthy, flexible and efficient computing platform that tolerates transient faults. The multidisciplinary project team will do this by developing: (1) programming language-level reliability specifications so consumers can dictate the level of reliability they need, (2) reliability-preserving compilation and optimization techniques to improve the performance of reliable code but ensure correctness (3) automatic, machine-level verifiers so compiler-generated code can be proven reliable, (4) new software-modulated fault tolerance techniques at the hardware/software boundary to implement the reliability specifications, and finally (5) microarchitectural optimizations that explore trade-offs between reliability, performance, power, and cost.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0627650
Program Officer
Samuel M. Weber
Project Start
Project End
Budget Start
2006-09-01
Budget End
2011-08-31
Support Year
Fiscal Year
2006
Total Cost
$1,100,000
Indirect Cost
Name
Princeton University
Department
Type
DUNS #
City
Princeton
State
NJ
Country
United States
Zip Code
08540