This project starts with the assumption that software and hardware faults cannot be entirely prevented, and seeks to discover new mechanisms by which a computing system can recover from faults in its components and resume predictable operation in a timely manner. The focus is on embedded computing systems, such as those that control our transportation and power systems. One novel aspect of this work is that even low-level aspects of the operating system are assumed to be potentially faulty, and designed to recover upon fault. A "Superglue" component connection language and runtime support system provides a unified framework for fault detection, isolation, and recovery. The research includes investigations of mechanisms to track the state required to reconstruct a given part of a system if it should fail, and the policies to reconstruct this state in time that the impact of the recovery process on the physical world can be tolerated.

Success of this research will potentially benefit large segments of society. The physical infrastructure upon which many aspects of our life depend is increasingly reliant on software for control. The complexity of the software in these embedded systems, which interact with the physical world, is increasing with the complexity of the tasks performed and features required. With this mounting complexity the probability of faults increases, while our increasing dependency on these systems makes the consequences of failure more severe. Faults in computing systems may manifest as brief interruptions of service -- for example, the flickering of control panels -- or as massive system failures, in the worst case leading to loss of life. Therefore, the resilience and robustness of these computing systems is of critical importance.

The PI also will continue and expand his efforts in connecting this research with graduate and undergraduate education, including courses on robotics, and outreach to the Washington DC public schools.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
1149675
Program Officer
Marilyn McClure
Project Start
Project End
Budget Start
2012-02-01
Budget End
2018-01-31
Support Year
Fiscal Year
2011
Total Cost
$426,000
Indirect Cost
Name
George Washington University
Department
Type
DUNS #
City
Washington
State
DC
Country
United States
Zip Code
20052