This research project aims to understand and develop systems for maintaining superlinear indexes for streaming data. A superlinear index provides search capability over an abstract space that cannot easily be linearized (totally ordered). In contrast, a linear index, typified by a B-tree, supports point and range queries on totally ordered data.

Examples of superlinear indexes include multidimensional indexes, which can be over a geometric domain, such as geographic data, or which can be over multiple linear indexes; and full text queries, which can include searching for a particular word or substring.

The superlinear indexes found in today's databases cannot support high rates of insertion. On traditional mechanical disk drives, the existing superlinear indexes can only support about one hundred insertions per second in the worst case. For many important applications, that is too slow, and so database users often avoid superlinear indexing. Even traditional linear indexes based on B-trees cannot support the high insertion rates demanded by many databases.

This research investigates streaming superlinear indexes, that is, indexes that efficiently support full text or multidimensional queries, and can be updated at speeds that are related to disk bandwidth rather than seeks per second.

Among the significant research issues are the following: (1) design efficient files structures for streaming superlinear indexes; (2) investigate how streaming superlinear indexes might pave the way to improved file systems; (3) determine whether cache-oblivious algorithms technology can enhance streaming superlinear indexes; and (4) program complex data structures for transactions and recovery.

If successful, this research will show how to build filesystems that achieve dramatically better performance than today's B-tree-based filesystems, how to maintain rich geometrical data and multidimensional nongeographical databases in real time, and how to maintain full-text searchable databases in real time. For example, some of today's file systems try to maintain an full-text index to find strings in files quickly, but these systems often fall behind at high data write rates. A streaming superlinear index would allow such a file system to keep up, and would improve the usability of both high-end storage systems and relatively small consumer storage systems that are nonetheless too large to index with today's indexes.

The researchers are developing course materials on streaming indexing technology which will be made freely available under the MIT OpenCourseWare initiative (http://ocw.mit.edu).

Further information on this project may be found at the project web page: http://supertech.csail.mit.edu/superlinear-indexes

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0937829
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2009-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2009
Total Cost
$200,007
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
New Brunswick
State
NJ
Country
United States
Zip Code
08901