Software failures in server applications are a significant problem for preserving system availability. In the absence of perfect software, this research focuses on tolerating and recovering from errors by exploiting software elasticity: the ability of regular code to recover from certain failures when low-level faults are masked by the operating system or appropriate instrumentation. Software elasticity is exploited by introducing rescue points, locations in application code for handling programmer-anticipated failures, which are automatically repurposed and tested for safely enabling fault recovery from a larger class of unanticipated faults. Rescue points recover software from unknown faults while maintaining system integrity and availability by mimicking system behavior under known error conditions. They are identified using fuzzing, created using a checkpoint-restart mechanism, and tested then injected into production code using binary patching. This approach masks failures to permit continued program execution while minimizing undesirable side-effects, enabling application recovery and software self-healing.

Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$450,000
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027