Today's applications frequently feature massive and heterogeneous data and complicated computational requirements. There have been many efforts towards efficient parallel query processing and optimization. However, the full potential of parallelism has not been realized by existing techniques and frameworks in scaling to massive datasets, especially for applications that inherently demand recursive data accesses.
The project offers a theoretical methodology for tackling the problem of parallel query evaluation on massive data. The PI conjectures that to maximize parallelizability of generic queries, e.g., queries that are used frequently in analytical and transactional applications, one needs to examine queries that are inherently parallelizable as the basic unit of study. She identifies symmetric queries as a set of queries that are potentially highly parallelizable and will use such queries as a stepping stone to study parallelizable query languages and leverage the findings to design techniques for efficient evaluation of generic queries. In particular, the project focuses on three separate, yet highly related tasks: (1) design and study a set of query languages whose queries are symmetric, investigate the properties of these languages, and propose and prove theoretical bounds on the computational complexity of the languages, in terms of scaling and data skew; (2) investigate and propose data structures and algorithms for efficiently evaluating queries of these languages in a parallel manner; and (3) propose strategies including query rewrite and optimization techniques for efficient evaluation of arbitrary queries, based on the new data structures and algorithms that result from (2).
During the exploratory phase of this project, the PI is conducting research activities in key areas in all three aforementioned topics. These will build the theoretical foundation, form strong collaborations with experts in related areas, and lay the groundwork for an effort suitable for a full-size XPS project. The research result of this project will be beneficial to both the database and the parallel computing communities as a new way to approach the problem of integrating the techniques of each.
The research methodology and algorithms developed is to be integrated into the undergraduate- and graduate-level database courses the PI teaches, as course materials and topics for course projects. Graduate students are supported by the project as research assistants. The PI works with various initiatives to recruit and encourage undergraduate students to participate in research activities.