BIGDATA: F: Collaborative Research: Optimizing Log-Structured-Merge-Based Big Data Management Systems

Christidis, Evangelos; Tsotras, Vassilis; Eldawy, Ahmed

Abstract

Modern big data management systems support fast read and write operations based on the unique identifier (key) of a record. That is, they are fast when inserting key-value pairs, and given a key they quickly return the value associated with that key. To do so, most such systems rely on a Log-Structured-Merge Tree (LSM) structure that batches writes together before writing them to persistent storage. This project will study how to efficiently support more sophisticated operations on LSM-based storage systems, that is, operations that do not simply specify the key of a record. Examples of such operations include searching for records based instead on their location or time. By optimizing the storage and management of big data, this project has the potential to cut the storage costs and energy consumption in data centers. Further, the successful completion of this work will allow users to manage more data with the existing hardware infrastructure, which is critical given the new wave of big data being generated by sensors and the Internet-of-Things. The project will capitalize on the student diversity at two Hispanic Serving Institutions, and thus broaden the participation of under-represented groups in the research process.

To support richer data modeling and querying capabilities on top of LSM key-value stores, this project will develop novel LSM indexing and access algorithms to support query plans that utilize both primary and secondary LSM components. In addition, it will design and evaluate flow control policies to dampen or eliminate the notoriously bursty data ingestion behavior that LSM-based storage structures exhibit. It will also study how to automatically and dynamically change LSM compaction policies and parameters based on the query workload. Data-semantics-aware compaction techniques will also be studied. The project will additionally develop novel LSM-aware query optimization techniques; the LSM storage layer is currently treated as a black box by most query optimizers. The planned methods will be deployed and evaluated on the open source Apache AsterixDB system.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1838222
Program Officer: Sylvia Spengler

Project Start
Project End
Budget Start: 2019-01-01
Budget End: 2022-12-31
Support Year
Fiscal Year: 2018
Total Cost: $1,390,073
Indirect Cost

BIGDATA: F: Collaborative Research: Optimizing Log-Structured-Merge-Based Big Data Management Systems
Christidis, Evangelos Tsotras, Vassilis Eldawy, Ahmed
University of California Riverside, Riverside, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments