CSR: Small: Scalable Fine-Grain Lineage for Debugging Data-Intensive Workflows

Deutsch, Alin; Yocum, Kenneth; Yocum, Kenneth

Abstract

This project addresses the next major impediment to the continued adoption of "big-data" analytics---the management of their life cycle, which includes debugging, tuning, and auditing. Today, data-intensive analytics are improving operations across multiple industries, translating terabytes of raw data into useful data analysis. Taking advantage of big data will be necessary to sustain competitive advantages for areas ranging from power generation, to retail, oil exploration, manufacturing, various scientific disciplines, and national security. However, the extreme scalability of these data processing architectures hides inefficiencies and obfuscates performance analysis, creating both obvious and hidden costs to their adoption. Tuning and debugging large data-intensive workflows is currently a black art that mostly consists of tedious manual analysis.

The research seeks to dramatically alter how data scientists design and debug their analytics to sidestep this authoring and deployment bottleneck. In particular, the PI's are developing scalable, efficient architectures for capturing fine-grain data lineage, information that tracks the use of data through the analytic pipeline, from a range of data-intensive scalable computing (DISC) systems. Such lineage serves as a basis for discovering inefficiencies and suggesting optimizations via step-wise debugging, fault tracing, anomaly detection, and lineage-driven data cleaning and data mining. The development and open-source release of such lineage-capture and analysis platforms promises to dramatically accelerate the adoption of big-data analytics.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Network Systems (CNS)
Type: Standard Grant (Standard)
Application #: 1219220
Program Officer: Marilyn McClure

Project Start
Project End
Budget Start: 2012-09-01
Budget End: 2018-03-31
Support Year
Fiscal Year: 2012
Total Cost: $450,000
Indirect Cost

CSR: Small: Scalable Fine-Grain Lineage for Debugging Data-Intensive Workflows
Deutsch, Alin Yocum, Kenneth Yocum, Kenneth
University of California San Diego, La Jolla, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments