The rampant growth of massive data sets presents new challenges for data processing and analysis. To cope with this phenomenon, sublinear time and sublinear space algorithms that are capable of analyzing and extracting value from such immense inputs must be developed. This project aims to study sublinear time and space algorithms from a unified perspective, using the synergies in order to gain a better understanding that will lead to faster, space efficient and more widely applicable algorithms.
The proposed research has two core components. The first component is the study of sparse representations of large data sets. Sparse representations are useful for quickly analyzing and processing data. This component itself will have two parts. First, it will lead to a better understanding of sublinear time sampling algorithms for data coming from a succinctly described distribution. The succinct descriptions considered include those defined by a small number of parameters, such as power laws, Gaussians, and histogram distributions. Second, it will improve the current state of knowledge of streaming algorithms, data sketching, compressive sensing and sparse recovery techniques. Such techniques will for example have an impact on algorithms for acquiring and processing images, audio and network data.
The second component aims to design novel statistical techniques for understanding various distributional quantities describing commonly used structured objects such as graphs. The focus in this component will be on sublinear time and space algorithms that estimate parameters of graphs. This project will significantly advance the algorithmic foundations of algorithms with limited resources, and develop highly efficient algorithms for analyzing massive data sets.
This project will significantly advance the algorithmic foundations of computation with limited resources. It will develop highly efficient algorithms for analyzing massive data sets that arise in diverse areas including electronic commerce, health, network security, and scientific data collection.
The broader impacts of this project are in the education and mentoring of young researchers including underrepresented groups. Community outreach activities will introduce primary school children to interesting mathematical and computer science ideas.