In a world where decisions are increasingly driven by data, data analytics skills have become an indispensable part of any education that seeks to prepare its students for the modern workforce. Essential in this skill set is the ability to work with structured data. The standard "tools of trade" for manipulating structured data include the venerable and ubiquitous SQL language as well as popular libraries heavily influenced by relational query languages, e.g., dplyr for R, DataFrame for pandas and Spark. Learning and debugging relational queries, however, pose challenges to novices. Even computer science students with programming backgrounds are often not used to thinking in terms of logic (e.g., when writing SQL queries) or functional programming (e.g., when writing queries using operators that resemble relational algebra). This project proposes to build a system called HNRQ (Helping Novices Learn and Debug Relational Queries) to address these challenges, by explaining why a query is wrong, and helping users to fix and learn relational queries in the process.

The first step in the project is to automatically construct small database instances as counterexamples to illustrate why queries return wrong results, and allow users to trace query execution over these instances. Going beyond convincing users that the queries are wrong, HNRQ further aims to guide users towards the next level of understanding---by helping them generalize from specific counterexamples to semantic descriptions of what cause wrong results, and by providing useful hints on how to approach the problems correctly. This ambitious goal will push the boundaries of existing research and will likely lead to the development of novel methodologies for providing explanations and hints. The project will make HNRQ general and practical by embracing the full complexity of real-world query languages and by delivering interactive performance for users to experiment with changes to queries and database instances, observe their effects, and obtain automated feedback and hints all in real time even for complex queries and large databases. The project plans to evaluate HNRQ not only through user studies but also by measuring its direct impact on learning outcomes. The project is committed to making HNRQ open-source and easy to adopt by educators around the world.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
2008107
Program Officer
Wei-Shinn Ku
Project Start
Project End
Budget Start
2020-10-01
Budget End
2023-09-30
Support Year
Fiscal Year
2020
Total Cost
$333,711
Indirect Cost
Name
Duke University
Department
Type
DUNS #
City
Durham
State
NC
Country
United States
Zip Code
27705