The Duke FaultFinder Project seeks to provide the first hardware mechanisms for dynamically verifying the correctness - not just necessary properties - of shared memory multiprocessor systems. The memory consistency model determines the correctness of a design. FaultFinder will dynamically detect violations of the specified memory consistency model, which is the highest level of error detection possible in hardware. FaultFinder mechanisms will detect hardware errors at the system level (e.g., violation of consistency), unlike existing schemes that only detect localized errors (e.g., bit flip on message). Combining FaultFinder error detection with existing hardware mechanisms for checkpoint/recovery of shared memory multiprocessor systems enables the system to guarantee correct behavior.

As society has increasingly relied upon computer systems to provide important infrastructure, computer engineers have not correspondingly improved the ability to detect faults in these systems. While recent advances in hardware checkpoint/recovery have improved computer system availability, a system recovery mechanism can only recover from those errors that are detected. Currently, computer systems cannot detect whether a memory system is behaving correctly. The Duke FaultFinder Project seeks to provide the first hardware mechanisms for comprehensive error detection in computer systems. Achieving this goal would provide a qualitative benefit to a society that depends on computer availability.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0309164
Program Officer
Timothy M. Pinkston
Project Start
Project End
Budget Start
2003-07-15
Budget End
2006-06-30
Support Year
Fiscal Year
2003
Total Cost
$114,422
Indirect Cost
Name
Duke University
Department
Type
DUNS #
City
Durham
State
NC
Country
United States
Zip Code
27705