The ubiquitousness of massive data sets presents great challenges. The difficulty of dealing with massive data sets is especially formidable when attempting to solve computational problems in which both the inputs and the outputs to the computation are large. In such a situation, it would be useful if one could find faster ways of computing just the portion of the output that is required by the user.
This project aims to study "local computation algorithms", namely algorithms that quickly compute only the portions of the output that are required by the user, without performing the full computation. In particular, local computation algorithms view only a miniscule portion of the input. The PI considers a broad based approach, studying the application of local computation algorithms to a range of problems and settings within algorithm design. The focus of this research is on the question of when local computations can be done in time that is sub-linear in the size of the input and output. The proposed research will develop techniques for constructing such algorithms and for understanding when such algorithms are not possible.
The project will leverage known results from sub-linear time algorithms, which for the most part have focused on the somewhat different setting of computational problems in which the inputs are large but the outputs are small. In addition, the project will investigate well-studied classes of algorithmic techniques and focus on modifying them for use in this new setting. Such classes include algorithmic techniques first developed for parallel and distributed algorithms, as well as the extensively used greedy method. Problems from combinatorial optimization, graph theory and compressibility of strings will be studied.
Classical models of algorithmic analysis assume that the algorithm reads an input,performs a computation and writes the answer to an output tape. On massive data sets, such computations may not be feasible, as both the input and output may be too large for a single processor to process. In such a situation, it would be useful if one could find faster ways of computing just the portion of the output that is required by the user. In this project, we investigate the model of "local computation algorithms", which support queries such that after each query by the user to a specified location, the local computation algorithm quickly outputs the value of the output at that location (without knowledge of future requests). When more than one legal output exists for an input, the local computation algorithm must answer queries in such a way that is consistent with a single legal output over time. For a given problem, the hope is that the query time and space complexity of a local computation algorithm is proportional to the amount of the solution that is requested by the user. That is, the time and space complexity should be sublinear in the size of the input and output to the problem, and should not require storing the history of all previous queries and answers. It is furthermore desirable that the techniques are amenable to parallelization, that is, that different copies of the local computation algorithms can run, accessing the same input, but without requiring coordinationg among their answers. Local computation algorithms satisfying these properties are useful for performing cloud computations on large datasets. This research project develops standard techniques for constructing local computation algorithms. Much of the focus is on combinatorial optimization problems such as algorithms for maintaining connectivity and low diameter of networks, and finding sparse connected subgraphs. In addition, algorithms for locally decompressing data and correcting sample data are developed.