Database management systems (DBMSs) are designed to be general-purpose tools that support a wide variety of applications, from banking to social networking and making scientific discoveries. To improve the performance of such applications, researchers have leveraged the unique characteristics of application areas to build domain-specific DBMSs that outperform traditional implementations. Performing such specialization requires labor intensive, complex, and error prone efforts. The intellectual merits of this project are to advance the state of the art in application-specific DBMS design by investigating techniques to perform such domain specialization automatically. As part of this project's broader impacts, the lessons and techniques learned will be integrated into programming languages and classes that the PI routinely teaches.

Specifically, this proposal aims to leverage recent advances in programming systems and data management research to build tools that can automatically understand database application semantics. Given such knowledge, the goals of this project are to 1) create tools that can automatically optimize the specific set of queries that can potentially be issued by the application, and prove that the optimized queries are semantically equivalent to the inputs; 2) investigate techniques to automatically select the optimal framework (in terms of execution time, resources required, etc) to execute the queries issued by the application, and 3) devise new languages for programmers to express their data consistency needs when queries are to be executed across a distributed set of nodes, and build an implementation of such languages. All software artifacts developed in this project are released to the public, with plans to incorporate their usage in both the undergraduate and graduate curricula. In addition, as part of the project is to collect and study the shortcomings of real-world database applications, the collected applications are collected into a repository that is publicly accessible repository for researchers and practitioners in the field to experiment and reproduce the results.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
2027575
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2020-04-01
Budget End
2022-06-30
Support Year
Fiscal Year
2020
Total Cost
$399,692
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94710