Sensor data of diverse types and large volumes need to be combined with the current standard SQL databases, which provide context and metadata for the sensor data. The combination will lead to a new generation of analytics in a number of areas, such as smart buildings that are based on building and environmental data collected by sensors. The project argues that this new generation of analytics must be based on the same healthy database technology cornerstones that the prior (non-sensor) business intelligence platforms were based on: Declarative queries, automatic optimization, efficient storage representations and multiple layers of abstraction lead to high productivity for the developer and the analyst. Such productivity is currently absent from sensor data analytics because database technology and sensor data processing currently do not mix well. Productivity is especially low in cases involving (a) many types of sensor data, (b) combinations of sensor data with conventional database data that provide context and (c) many types of analyses. Besides low productivity, the current (limited) state of the art poses very high expertise requirements on the analysts: They must be simultaneously experts in signal processing, statistics and big data management. The project will deliver a database system for sensor data, where the analyst can rapidly develop declarative queries that are automatically optimized. By doing so, the project will deliver the envisioned productivity gains and will lower the technical sophistication bar needed for acting in the space, therefore enabling many scientists and domain specialists to engage in analytics.

This project argues that at the core of the failure of SQL databases in the management and analytics of sensor spatiotemporal data is the lack of a critical abstraction, which is the real world models, which capture the stochastic processes that generate the measurements. The proposed Plato database system will bring the real world model concept into SQL databases by using models (spatiotemporal continuous functions) as first class citizens. The delivery of Plato requires innovative solutions to multiple problems: The project will design and implement (a) a model-aware data model and respective query language features that allow seamless combination of conventional SQL querying with statistical signal processing, (b) learning algorithms that learn the model components of reduced-noise, additive model representations, which are naturally compressions of the original, (c) query processing algorithms that operate directly on the compressed representations and utilize the the relatively few bits necessary for the required confidence of the analytics, and (d) semiautomated algorithms that further compress the model representations by considering the dependencies (mutual entropy) between the models. Finally, the project will exercise the resulting system on large scale statistical sensor data processing cases, such as the ones presented by the UCSD Energy Dashboard. The exercise will measure the lines-of-code as well as the runtime efficiency of the analyses.

For further information see the project web site at www.db.ucsd.edu/NSF14Plato

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1447943
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2014-09-01
Budget End
2018-12-31
Support Year
Fiscal Year
2014
Total Cost
$1,100,000
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093