For over three decades, the B-tree has been the data structure of choice for maintaining searchable, ordered data on disk. Such B-trees are sometimes referred to as "cache-aware" B-trees because they are aware of, and optimize for, one particular block size (such as the disk block size) in the memory hierarchy. In contrast, recent theoretical results show how, in principle, to build a "cache-oblivious" B-tree, which simultaneously optimizes for all block sizes with no explicit measurement or tuning. Although recent experiments show that cache-oblivious B-trees can outperform cache-aware B-trees for any given block size, sometimes by two orders of magnitude, cache-oblivious B-trees are not yet practical. This project aims to understand how the theoretical promise of cache-oblivious B-trees can be realized in practice.

Building a practical library for cache-oblivious search trees is a difficult systems engineering problem. Existing designs are complex and lack concurrency. They provide amortized performance guarantees rather than worst-case performance guarantees. They do not support the transactional semantics required by databases and file systems. By combining theoretical insights with modern software technology, such as memory mapping, this research project hopes to provide a practical library for cache-oblivious search trees that offers provable guarantees of performance.

Project Start
Project End
Budget Start
2006-08-15
Budget End
2009-07-31
Support Year
Fiscal Year
2005
Total Cost
$275,000
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139