CSR: Small: Telescopic Analysis for Black-Box Troubleshooting of Distributed Systems

Flinn, Jason; Cafarella, Michael

Abstract

This project is developing scalable mechanisms to debug, monitor, and assess the quality of the complex distributed systems that represent the backbone of modern software infrastructure. These methods are necessarily highly-automated; they reason about the operation of distributed systems while treating the components of such systems as black boxes. This means that the methods do not require source code, programmer annotation, or developer input to troubleshoot a distributed system. Instead, they rely on detailed information gleaned from pre-existing log messages that are nearly ubiquitous in every large-scale distributed system and data extracted via binary analysis of components as they run.

These new methods, termed telescopic analysis, combine the ability to collect extremely detailed, low-level information about systems executing large numbers of requests with "big data" analysis that mines insights and create models of system operation from the corpus of detailed observations. Telescopic analysis uses targeted, sample-based logging and/or binary analysis to generate substantial quantities of high-precision data about specific runs of the system under observation. It then combines these observations into models that capture the aggregate behavior of the system. Comparing the general model with the detailed observations of each run allows understanding of how that run conforms to or deviates from the common operation of the system. The project is also developing tools and query languages that allow understanding of the results of such comparisons, both in aggregate and as pertains to specific runs, for performance analysis, debugging data quality failures, understanding outlier behavior, and performing "what-if" analysis.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Network Systems (CNS)
Type: Standard Grant (Standard)
Application #: 1421441
Program Officer: Marilyn McClure

Project Start
Project End
Budget Start: 2014-09-01
Budget End: 2018-08-31
Support Year
Fiscal Year: 2014
Total Cost: $498,241
Indirect Cost

CSR: Small: Telescopic Analysis for Black-Box Troubleshooting of Distributed Systems
Flinn, Jason Cafarella, Michael
Regents of the University of Michigan - Ann Arbor, Ann Arbor, MI, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments