As pervasive sensors continuously collect and record massive amounts of high-dimensional data from communication, social, and biological networks, and growing storage as well as processing capacities of modern computers have provided new and powerful ways to dig into such huge quantities of information, the need for novel analytic tools to comb through these "big data" becomes imperative. The objective of this project is to develop a novel framework for nonlinear, data-adaptive (de)compression algorithms to learn the latent structure within large-scale, incomplete or corrupted datasets for compressing and storing only the essential information, for running analytics in real time, inferring missing pieces of a dataset, and for reconstructing the original data from their compressed renditions.
The intellectual merit lies in the exploration of the fertile but largely unexplored areas of manifold learning, nonlinear dimensionality reduction, and sparsity-aware techniques for compression and recovery of missing and compromised measurements. Capitalizing on recent advances in machine learning and signal processing, differential geometry, sparsity, and dictionary learning are envisioned as key enablers. Effort will be put also into developing online and distributed (non)linear dimensionality reduction algorithms to allow for streaming analytics of sequential measurements using parallel processors.
The broader impact is to contribute to the development of novel computational methods and tools useful for data inference, cleansing, forecasting, and collaborative filtering, with direct impact to statistical signal processing and machine learning applications to large-scale data analysis, including communication, social, and biological networks.