A neglected consequence of the proliferation of scientific data sets and computational services on the Internet is that the scientific community is becoming increasingly dependent on the quality of shared data sets and the reliability of the software used to analyze them. To make it easier for developers of bioinformatics software to ensure its reliability, this research seeks to develop, evaluate, and refine automated techniques to help developers discover emergent reliability problems with deployed software, understand their nature and significance, and diagnose their causes. The approach is based on eliciting structured feedback from users about the problem symptoms observed and then automatically correlating this feedback with far more detailed information about internal program dynamics and input-output mappings. Advanced data mining techniques will be employed in tandem with dynamic program dependence analysis to: pool problem reports from different users; corroborate individual reports; group failures according to their symptoms and causes; and help diagnose the causes of failures. The proposed work has the potential to significantly improve the reliability of bioinformatics applications used by thousands of scientists.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0820217
Program Officer
Sol J. Greenspan
Project Start
Project End
Budget Start
2008-06-15
Budget End
2012-05-31
Support Year
Fiscal Year
2008
Total Cost
$699,147
Indirect Cost
Name
Case Western Reserve University
Department
Type
DUNS #
City
Cleveland
State
OH
Country
United States
Zip Code
44106