A modern database server typically runs on a cluster of machines with a large amount of RAM, to ensure fast query responses when the commonly accessed data (or even the whole database) can fit in main memory. Data may reside in remote or local memory, or in one of several levels of cache; each type of memory has its own characteristic size and performance properties. This project will develop query processing algorithms and a query processing system tailored to such memory hierarchies. Analytic database systems have emerged as a key technology for extracting actionable information from "big data" collections ranging from medicine to science to business. Improvements in the performance of core database operations achieved by the project will have impact on many application domains. The project will also contribute to education by contributing technology for use in teaching database system implementation techniques.

The project will develop a database architecture with efficient partitioning as the core operation, using different partitioning techniques at each level of the memory hierarchy. In-place partitioning will be employed where appropriate to avoid allocating extra memory, with only a small decrease in partitioning speed. Partitioning will be used as a building block for other operators such as sorting, joins and aggregation. A variety of database algorithms will be implemented, taking advantage of the efficiency of partitioning to perform well at scale. Network transfer volumes will be reduced using specialized join algorithms, since such transfers are likely to be the bottleneck for queries involving large distributed joins. These algorithms will form the basis of a database system prototype to be developed during the course of the project. The project will provide new techniques to exploit modern machines for efficient analytic query processing. The proposed system will significantly improve the throughput of data-intensive queries.

For further information see the project web site at: www.cs.columbia.edu/~kar/cpumem.html

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1422488
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2014-09-01
Budget End
2018-08-31
Support Year
Fiscal Year
2014
Total Cost
$499,998
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027