Sublinear-space data structures have been of major recent interest, with applications for example in streaming algorithms, distributed computation, randomized linear algebra, and compressed sensing. Small-space solutions fit in fast cache, thus providing time speedups as well, and also require less storage and less bandwidth in distributed settings. This proposal aims to develop novel methods for designing and analyzing sublinear-space data structures for two seemingly different but closely related objects: vectors and graphs. The PIs also plan to teach advanced graduate courses whose topics overlap with the focus of this project. Furthermore, both PIs plan to train and mentor graduate and undergraduate students students in research, organize workshops, and write survey articles. The PIs plan to also participate in outreach activities to non-computer scientists and to K-12 students, and to broaden participation in computing. The research itself may also have industrial impact, being related to databases, network traffic analysis, and data mining.
The PIs will focus on a set of problems that can be captured by the turnstile streaming model: some high-dimensional vector x in R^n receives coordinate-wise updates, which in the case of graphs could have n = (|V| choose 2) (where V is the set of vertices), and edge insert/deletion then corresponds to addition/subtraction from some entry in x. This project aims to further the understanding of fundamental questions related to small-space dynamic data structures for vector updates, and especially as they relate to graph problems. For example:
* Small-space vector update data structures: the insertion-only case: In the insertion-only case, vector updates increment coordinates of x, so that x is a frequency-count vector of the number of occurrences of various items in a data stream. The PIs plan to attack some of the most fundamental problems in this model, such as norm estimation, heavy hitters, and continuous monitoring of statistics.
* Fully dynamic streams and applications to graphs: Many of the best-known small-space dynamic data structures for graph problems operate by reducing to vector-update problems. For example, the only known nearly linear-space algorithm for spectral sparsifiers operates by reduction to l_2 heavy hitters, and algorithms for connectivity, k-edge connectivity, minimum spanning trees, and several others reduce to vector coordinate-sampling problems. Many open problems though still remain, e.g. what is the optimal space complexity for connectivity in dynamic streams? The PIs also plan to investigate several other dynamic graph and hypergraph problems.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.