The amount of existing and newly appearing data accumulated by scientific, governmental and business organizations around the world is daunting. As data of all types gets easier to obtain and cheaper to store, data sets are becoming increasingly large. Consequently, there is a need to perform computational tasks on massive data sets: comparing genomes, compressing media files, searching through and clustering large sets of documents, studying the Internet graph, and compiling census data, to name just a few. This research project has its roots in the following question, fundamental to several fields:

* Can we still compute something useful about a data set when reading all of it is prohibitively expensive? In other words, what can we do in time sublinear in the length of the input?

For most interesting problems, an exact algorithm provably has to read the entire input, and thus requires at least linear time. However, sublinear algorithms hold the promise of extremely efficient approximate computations.

This award integrates research activities aimed at broadening the scope and extending the applicability of sublinear algorithms, with educational activities aimed at building a strong theoretical computer science group at Penn State and broadly disseminating new findings. It pursues three general directions for investigating the power of sublinear algorithms: (1) designing and analyzing new sublinear algorithms for concrete tasks that are used in a variety of applications and require new techniques, (2) building the foundations for a coherent theory of sublinear computation, with the focus on generic techniques for algorithm design and analysis, and new models for data access, (3) exploiting the special structure of problems solvable with sublinear-space algorithms in contexts (e.g., data privacy), where extreme efficiency is not a requirement per se, but helps to guarantee other properties, such a robustness to small changes in the data. A theme common to all three directions is to encompass new problems and models of practical relevance.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0845701
Program Officer
Dmitry Maslov
Project Start
Project End
Budget Start
2009-01-01
Budget End
2014-12-31
Support Year
Fiscal Year
2008
Total Cost
$408,000
Indirect Cost
Name
Pennsylvania State University
Department
Type
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802