CT: Well-Typed Trustworthy Computing in the Presence of Transient Faults

Walker, David; Appel, Andrew; Clark, Douglas; Martonosi, Margaret; August, David

Abstract

In recent decades, microprocessor performance has been increasing exponentially, due in large part to smaller and faster transistors enabled by improved fabrication technology. While such transistors yield performance enhancements, their lower threshold voltages and tighter noise margins make them less reliable, rendering processors that use them more susceptible to transient faults caused by energetic particles striking the chip. Such faults can corrupt computations, crash computers, and cause heavy economic damages. Indeed, Sun Microsystems, Cypress Semiconductor and Hewlett-Packard have all recently acknowledged massive failures at client sites due to transient faults.

This project addresses several basic scientific questions: How does one build software systems that operate on faulty hardware, yet provide ironclad reliability guarantees? For what fault models can these guarantees be provided? Can one prove that a given implementation does indeed tolerate all faults described by the model? Driven in part by the answers to these scientific questions, this project will produce a trustworthy, flexible and efficient computing platform that tolerates transient faults. The multidisciplinary project team will do this by developing: (1) programming language-level reliability specifications so consumers can dictate the level of reliability they need, (2) reliability-preserving compilation and optimization techniques to improve the performance of reliable code but ensure correctness (3) automatic, machine-level verifiers so compiler-generated code can be proven reliable, (4) new software-modulated fault tolerance techniques at the hardware/software boundary to implement the reliability specifications, and finally (5) microarchitectural optimizations that explore trade-offs between reliability, performance, power, and cost.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Network Systems (CNS)
Application #: 0627650
Program Officer: Samuel M. Weber

Project Start
Project End
Budget Start: 2006-09-01
Budget End: 2011-08-31
Support Year
Fiscal Year: 2006
Total Cost: $1,100,000
Indirect Cost

CT: Well-Typed Trustworthy Computing in the Presence of Transient Faults
Walker, David Appel, Andrew Clark, Douglas Martonosi, Margaret August, David
Princeton University, Princeton, NJ, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments