Systems today must cope with failures induced by many factors outside the control of the organization producing the software: faults in infrastructure and components developed by third-parties, unpredictable loads, and variable resources. Modern systems must therefore take increasing responsibility for problem detection and repair at runtime. Effective fault detection and repair could be greatly enhanced by run-time fault diagnosis and localization -- the ability to identify the source of problem so that appropriate actions can be taken either by a human operator or automated mechanisms to repair the system.

In this research we are developing new foundations for run-time fault diagnosis and localization. To do this we are extending and synthesizing recent advances in two areas. The first is the use of architecture models for monitoring a system at run-time. The second is the use of spectrum-based reasoning for fault localization (SFL). SFL is a lightweight technique that takes as its input a form of trace abstraction and produces a list of likely fault candidates, ordered by probability of being the true fault explanation. It has been used with impressive results during design time but thus far has not been exploited at runtime in the context of architecture-based monitoring and diagnosis.

This research will improve the trustworthiness and robustness of modern software systems by providing new techniques for diagnosing faults while a system is running, thereby providing an improved basis for fault detection and resolution.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1116848
Program Officer
Marilyn McClure
Project Start
Project End
Budget Start
2011-08-01
Budget End
2015-07-31
Support Year
Fiscal Year
2011
Total Cost
$450,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213