This proposal will be awarded using funds made available by the American Recovery and Reinvestment Act of 2009 (Public Law 111-5), and meets the requirements established in Section 2 of the White House Memorandum entitled, Ensuring Responsible Spending of Recovery Act Funds, dated March 20, 2009.

The STCI: Middleware for Monitoring and Troubleshooting of Large-Scale Applications on National Cyberinfrastructure project aims to provide robust and scalable workflow monitoring services that can be used to track the progress of workflow-based applications as they are executing on the distributed cyberinfrastructure. New anomaly detection and troubleshooting services will also be developed to alert users to problems with the application and cyberinfrastructure services and allow them to quickly navigate and mine the application's execution records. The foundation of this work is the development of a robust and scalable infrastructure for performance information gathering and distribution. Information flowing through this infrastructure will be stored in high-performance archives and distributed to interested entities through subscription interfaces. Three main services will be developed: 1) an online monitoring service, 2) an anomaly detection service based on dynamic mining of application and cyberinfrastructure logs and 3) a troubleshooting service that will help trace the source of a failure.

Intellectual Merit This work will potentially increase scientists' productivity by allowing them to quickly identify problems in an application, thus reducing the time it takes to generate scientifically meaningful results. This work will also make the performance of complex scientific workflows more transparent, which will enable the generation of accurate estimates of overall time to completion, more efficient use of resources, and easier resolution of end-to-end performance problems in collaboration with network and resource providers.

Broader Impact Scientific communities in astronomy, biology, earthquake science, physics, and others will immediately benefit from the proposed system. Because the approach relies on simple, well-defined logging formats, this work is applicable to a range of workflow management systems as well as sub-components of those systems such as job managers and data transfer tools.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
0943705
Program Officer
Kevin L. Thompson
Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$1,875,831
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089