Systems for information storage, gathering, or communication should be designed with careful attention to how the information will eventually be used. Often data are used without regard to their ordering; e.g., most databases are accessed by searching, and order is irrelevant to statistics like means, medians, etc. The observation that order can be irrelevant is powerful because sometimes ignoring order dramatically improves compression. Remarkably, in some situations the reduction in bits can approach 100%. This project seeks to develop theory, algorithms, and applications for communication when ordering is fully or partially irrelevant. The theoretical aspect includes establishing performance bounds when very little is known about the information source prior to encoding and when ordering is partially maintained. The algorithmic focus is on computationally-efficient algorithms with limited buffering requirements. Lowering the required communication rates in networked data gathering could enable cheaper, smaller and lower-power devices and thus hasten the deployment of large-scale and battery-operated sensing systems.

The genesis of this project is the following order reduction: Communicating any nontrivial sequence of n (ordered) symbols requires a number of bits that is linear in n; however, disregarding order lowers the rate to O(log n) when the source alphabet is finite. Results for universal coding over countable alphabets and rate-distortion problems also show large differences between the ordered and unordered communication problems. The project aims to establish fundamental bounds on compressing unordered data (discrete-and continuous-valued sources, with or without full distributional knowledge); develop compression techniques (scalar and vector quantizers, indexing, refinement); and apply data set (as opposed to sequence) compression in practice. Extending the results to partial preservation of order could have important consequences for conventional compression.

Project Start
Project End
Budget Start
2007-09-15
Budget End
2011-08-31
Support Year
Fiscal Year
2007
Total Cost
$285,000
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139