III: Medium: Collaborative Research: Supporting High-Value Analytics on Big Low-Value Data

Carey, Michael

Abstract

A wealth of digital information is being generated through social networks, blogs, online communities, news sources, and mobile applications as well as a myriad of device-based sources such as smart-home devices and wearable sensors. Data analysts in a number of domains, e.g., government, public health, national security, and public safety, stand to benefit greatly from the ability to perform retrospective as well as interactive analyses over such data. The key feature of this data is that an individual item, such as a tweet or a sensor reading, is low-value by nature. Such data becomes of high-value only when large quantities of such data are analyzed together. This project seeks new data management techniques to enable data analysts to process large quantities of such low-value data. The key challenge is to support analytic queries efficiently and interactively, while being aware of the low-value nature of the data, using cost-effective solutions such as cheap commodity hardware.

Support for data analytics has been well studied, both for centralized and parallel databases, for tabular data. However, given memory prices where the high-value transactional data for a typical enterprise can fit in the memory of a high-end server, most recent work has been on analytics for memory-resident data. In contrast, this project aims to support analytics over data arising from social, mobile, Web, and IoT data sources. This data is much larger, so memory-residence is not cost effective for storage or analysis, as only in aggregate do the data items become high-value. The project has three main thrusts. The first thrust focuses on efficient storage and resource-aware query processing for large volumes of data that are nested, semi-structured, and lacking a predefined schema. The second thrust introduces a flexible join framework to handle complex join queries â€“ including joins over spatial, temporal, and textual data â€“ to allow multiple datasets to be combined to increase their value. The third thrust, since big low-value often involves sequences of events, focuses on efficient window query processing; parallel processing of window queries, in order to scale, is essential for big low-value data analytics.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1954962
Program Officer: Wei-Shinn Ku

Project Start
Project End
Budget Start: 2020-10-01
Budget End: 2023-09-30
Support Year
Fiscal Year: 2019
Total Cost: $600,000
Indirect Cost

III: Medium: Collaborative Research: Supporting High-Value Analytics on Big Low-Value Data
Carey, Michael
University of California Irvine, Irvine, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments