CAREER: Analyzing Distributed Systems Behavior Using Repeated Execution

Prabhakar, Sunil; Killian, Charles

Abstract

Modern distributed systems are extremely complex, due in large part to individual node complexity, node unreliability and asynchrony, and unpredictable network message delays and orderings. Further complicating development of these systems is both the presence of multiple potentially incompatible versions of the systems, and the need to build correct systems also exhibiting high performance. Prior testing and simulation frameworks are characterized either by extensive manual effort, or automated search for violations of a binary decision problem---the presence or absence of a bug.

We are developing automated and interactive techniques for helping developers understand the behavior of distributed systems implementations. By leveraging the evolved frameworks, and instrumenting implementations in structured, straightforward ways, we are building development tools focused on understanding system behavior rather than merely identifying correctness errors. This change in focus will enable more general tools that improve development productivity in addition to testing productivity.

This research is proceeding on three fronts: 1) developing automated tools using data mining with repeated executions to extract execution behaviors and performance, 2) developing flexible execution descriptions suitable for both use-case descriptions and automated processing, allowing more intuitive interaction between users and their tools, and 3) incorporating testing tools with revision control systems, enabling multi-version analysis and long-term progress tracking. When completed, this research will reduce developer effort necessary to design, update, and debug distributed systems, and may inspire creation of a new class of systems debuggers analysing not just correctness, but also performance and complexity.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Network Systems (CNS)
Application #: 1054567
Program Officer: Marilyn McClure

Project Start
Project End
Budget Start: 2011-07-01
Budget End: 2013-08-31
Support Year
Fiscal Year: 2010
Total Cost: $134,905
Indirect Cost

CAREER: Analyzing Distributed Systems Behavior Using Repeated Execution
Prabhakar, Sunil Killian, Charles
Purdue University, West Lafayette, IN, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments