Data-centric technologies form an essential part of the nation's economy and infrastructure. Coincident with the evolution of these technologies, the amount of information that is being collected, stored and processed is growing at an exponential pace. A sustainable solution for managing this explosion of data should be guided in part by a fundamental understanding of how information can be stored in a succinct manner that meets utility and access requirements. This research seeks to quantify basic tradeoffs associated with information storage, utility and accessibility.
Information theory provides a framework for quantifying what is and what is not possible in data compression and transmission. The traditional information-theoretic framework, however, does not readily address critical challenges inherent to modern data processing applications and large data sets. In particular, information-theoretic analyses generally ignore complexity constraints and encounter significant difficulties when distributed settings are considered. Building on recent work by the principal investigator on compression under logarithmic loss and compression for similarity queries, this project will address these two issues by investigating rate-distortion tradeoffs with the added dimensions of distributed processing and complexity constraints. Specific goals of the project include (i) characterizing the compressibility of an information source subject to the constraint that elementary queries are supported in the compressed domain with bounded complexity; and (ii) quantifying the performance of distributed processing (relative to centralized processing) for compression and inference. As a counterpart to the proposed theoretical analyses, the design of practical algorithms that are capable of achieving the established information-theoretic limits will be pursued.