This project focuses on questions related to bounding the space requirements of computation in various settings, and the relationship between computation time and space.

Current computation tasks often involve very large data sets, for example data originating from the internet, biological or other scientific databases. The size of input data in certain computational tasks requires special considerations and solutions that work with only small portions of the input at a time: manipulating all of the input at once would be prohibitive even with the latest computer technology. Locally decodable codes and streaming algorithms are motivated by such applications.

Locally decodable codes are error correcting codes with the extra property that in order to retrieve the correct value of one position of the input with high probability, it is sufficient to read just a small number of positions of the possibly corrupted codeword. So far the known constructions of such codes with constant number of queries have very large length with respect to the input size, and there is a large gap between the known upper and lower bounds on the length of codewords, even in the case of 3-query codes. The project further examines the relationship between the length necessary for the codewords, the number of queries allowed for decoding, and the error correcting properties of locally decodable codes.

In addition, the project includes proving bounds on storage space in the cell probe model while limiting the number of positions accessed to answer questions about the data, and bounding the space requirements of streaming algorithms. Finally, the project addresses the relationship between the size and depth of Boolean circuits necessary to compute a given function. This is directly related to the relationship between the time and space of computation.

Project Report

The main focus of the project was the study of space bounded computation in various contexts, motivated by current computational tasks involving large amounts of data. In addition to classical questions of computational complexity, aspects of space bounded computation were examined in the context of coding theory, in particular locally decodable codes. Locally decodable codes are error correcting codes with the extra property that it is sufficient to read just a small number of positions of a possibly corrupted codeword in order to recover any one position of the input. This property is useful when encoding very large data sets, for example patient records of a hospital. Traditional error correcting methods that provide good error tolerance against arbitrary errors would require processing the entire database (e.g. data of the entire hospital) even for recovering small amounts of information (e.g. individual patient records). Locally decodable codes allow reliable recovery of any individual record, by accessing not much larger amount of data then the desired information. The challenge is to achieve this even when any part of the database can be corrupted by adversarial errors. To achieve local decodability, it is necessary to use randomness in the decoding procedures. We refer to the probability of returning the correct answer as the correctness of the decoding algorithm. In joint work with Andrew Mills, we showed that achieving slightly larger correctness than the known subexponential length constructions requires exponential codeword length for 3-query codes. Previously, there were no larger than quadratic lower bounds known for locally decodable codes with more than 2 queries, even in the case of 3-query linear codes. Our lower bounds hold for linear codes over arbitrary finite fields and for binary nonlinear codes. We also obtained similar results for arbitrary number of queries, under certain assumptions on the decoding algorithm, that have been commonly used in previous constructions. Our results explain the limitations on correctness of these algorithms. Batch codes are closely related to the concept of local decodability, as well as to efficient use of storage space. Batch codes were introduced by Ishai, Kushilevitz, Ostrovsky and Sahai in 2004. A batch code specifies a method to encode a string x into an m-tuple of strings stored on m distinct servers, such that any subset of k symbols from x can be retrieved by reading at most t symbols from each server. The goal is to keep the "load" t small, while also minimizing the total storage space. The problem is motivated by applications to load balancing in distributed storage, private information retrieval and cryptographic protocols. Combinatorial batch codes are purely replication based batch codes, where each server stores a subset of symbols of the string x. In joint work with Natalia Silberstein, we gave optimal constructions of combinatorial batch codes for a new range of parameters. Our constructions are based on block designs. The depth of Boolean circuits is closely related to the space complexity of the corresponding functions. As part of this project, we studied questions of space bounded computation related to the size versus depth problem, in particular we considered tradeoffs between simultaneous time and space bounds in the context of Boolean circuit complexity, In joint work with Jing-Tang Jang, we obtained significant improvements over the general bounds for the size versus depth problem for special classes of Boolean circuits. We showed that every layered Boolean circuit of size s can be simulated by a layered Boolean circuit of depth $O(sqrt{s log s})$. For planar circuits and synchronous circuits of size $s$, we obtained simulations of depth $O(sqrt{s})$. Improving any of the above results by polylog factors would immediately improve the bounds for general circuits. We also obtained results that imply that the class of languages computed by non-uniform families of polynomial size circuits that have constant size segregators equals non-uniform $NC^1$. We also considered the complexity of encoding asymptotically good error-correcting codes, that have constant rate and correct a constant fraction of errors. The complexity of encoding good codes has been studied before. It was shown that some popular encoding methods cannot yield good codes under some assumptions on the complexity of the encoder. In joint work with Kristoffer Hansen, Michal Koucky, Pavel Pudlak and Emanuele Viola, we proved tight bounds on the number of wires in constant depth (unbounded fan-in) circuits computing asymptotically good error-correcting codes. We showed that quasilinear number of wires is sufficient for computing good codes already in depth 2 circuits with parity gates. This implies that a (necessarily dense) generator matrix for the code can be written as the product of two sparse matrices. For depth 2, our Omega(n (log n/log log n)^2) lower bound gives the largest known lower bound for computing any linear map. Furthermore, we identified a new class of superconcentrator-like graphs with connectivity properties distinct from previously studied ones.

Project Start
Project End
Budget Start
2010-08-01
Budget End
2014-07-31
Support Year
Fiscal Year
2010
Total Cost
$346,480
Indirect Cost
Name
University of Texas Austin
Department
Type
DUNS #
City
Austin
State
TX
Country
United States
Zip Code
78759