Storage capacity and data volume have been doubling every 18 months during the past two decades. A key challenging issue in building next-generation storage systems is to manage massive amounts of feature-rich (non-text) data, which has dominated the increasing volume of digital information. Comparing noisy, feature-rich data requires fast similarity match instead of exact match, and thus exploring such data requires similarity search instead of exact search. Current file systems are designed for named text files; they do not have mechanisms to manage feature-rich data. To date, there is no practical storage system with the ability to do similarity search for noisy, high-dimensional data and there is no index engine design for efficient similarity search. This research addresses this problem by studying how to design and implement a content-addressable and -searchable storage (CASS) system to manage and explore diverse feature-rich data. The system includes a built-in similarity search engine for general-purpose, noisy, highdimensional metadata using compact data structures and novel indexing methods. The research will also develop segmentation methods and feature extraction methods for audio, image and genomic data, and develop similarity search benchmarks and to evaluate the CASS system.

This research will advance knowledge and understanding in the area of storage system designs such as data structures, mechanisms, and APIs for managing, searching and exploring noisy, high-dimensional feature-rich data. The research will accelerate the development of next-generation storage systems which will revolutionize how to access, search, explore and manage massive amounts of feature-rich data in many disciplines.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Application #
0509447
Program Officer
Krishna Kant
Project Start
Project End
Budget Start
2005-07-01
Budget End
2009-06-30
Support Year
Fiscal Year
2005
Total Cost
$900,001
Indirect Cost
Name
Princeton University
Department
Type
DUNS #
City
Princeton
State
NJ
Country
United States
Zip Code
08540