As many application domains continue to generate data at exponentially increasing rates, much of the data that is gathered is stored in a compressed format. However, very few classic data processing algorithms have been updated to handle compressed data. This project aims to address this gap by developing (i) algorithms for massive data sets that can directly operate on compressed data; and (ii) compression schemes that are aware of the algorithms that would operate on the data.
In many settings, algorithms that manipulate very large composite objects while interacting only with their succinct descriptions can substantially reduce the time and memory requirements relative to their counterparts that have to work with uncompressed representations of the same data. These performance gains are realized by leveraging highly repetitive or parametrically specified input structures, to enable algorithms to manipulate very large composite objects while interacting only with their compressed descriptions. Anticipated results of the project include new geometric algorithms that solve problems such as convex hull, Voronoi diagrams, nearest points and earth-mover distances when the inputs are in compressed format; new graph algorithms that compute minimum spanning trees, shortest paths, and network flows on compressed input graphs; and new compression-aware data structures that support efficient storing, querying and processing of compressed data. All the algorithmic contributions will be validated with experiments on real and synthetic massive data sets. The resulting algorithms are likely to find application in many different domains including networks, genomics, databases, computer graphics, artificial intelligence, geographic information systems, integrated circuit design, and computer-aided engineering.
Broader Impacts: Compression-aware data processing algorithms and algorithm-aware data compression schemes have applications across a wide range of tasks that involve processing of massive data sets consisting of large data objects (e.g., images, sequences, graphs). The formulations, algorithms, codes, and theories that will be developed and disseminated by this project are likely to contribute to the development of efficient and practical algorithms and data structures that could impact the way in which organizations collect, store, process, such data. The project offers enhanced research based training opportunities for students in an area of considerable theoretical as well as practical significance. Additional information about the project can be found at: www.cs.virginia.edu/robins