The demand to analyze large datasets is increasing at a rate faster than our ability to compute meaningful insights from the data. Despite the advances in query processing, query optimization, and novel hardware, efficient query evaluation remains a major challenge for the database community to date. This project will explore new ways to design query processing algorithms by using techniques from information theory. The field of information theory has been developed over the last seven decades and has found many applications ranging from communications to machine learning, but it has not been used yet in a significant way to support query processing. This project transfers techniques from information theory to the task of processing massive amounts of data.
The project introduces a new paradigm for query evaluation, called "from proofs to algorithms," which maps an information-theoretic proof of a query's upper bounds into optimal query evaluation algorithms. The project will pursue four thrusts. In the first thrust it will use information theory to derive new bounds on the query answer based on database statistics. The second thrust will develop a mapping from the information theoretic inequalities to relational operators, and from the complete proof of the query size to a complete query plan with provably optimal runtime. The third thrust will extend these techniques to finer grained statistics, including sketches, distinct values, and heavy hitters. Finally, the fourth thrust will extend the algorithm to queries with projections, aggregates, and group-by's.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.