This project will provide the scientific community with new tools for analyzing large and complex data sets though the design and implementation of new algorithms using compressive sensing. While typically complex and high-dimensional, modern data sets often have a concise underlying structure. Compressive sensing is a new data-acquisition paradigm that allows sparse data to be randomly undersampled without sacrificing reconstruction accuracy. Compressive sensing relies on theoretical results that allow for an overwhelming majority of the information content in sparse data to be captured from a relatively small number of random measurements. Recent advances in the core cyberinfrastructure areas of computation and data analysis have demonstrated that efficient matrix computations may also be realized through randomized techniques. Based on the same theoretical underpinnings, the success of both compressive sampling and randomized numerical linear algebra suggest a new framework by which random sampling can be integrated into data acquisition and processing for new, highly efficient, computational algorithms. As randomized matrix operations are amenable to parallelization, such algorithms may achieve further efficiency from modern multi-processor architectures. The main research objective of this work is the design of both structured sensing matrices and new randomized computational algorithms for processing compressively sensed data. A careful theoretical study will result in understanding how mathematical operations on compressed samples may be used to manipulate the data from which the samples were acquired. With this understanding, algorithms will be designed for integrating the compressive sensing acquisition process with randomized numerical matrix computations.
Massive amounts of data are produced by a wide range of scientific disciplines (e.g., genomics, astrophysics, internet and network analysis) as well as by commerce and industry (e.g., inventory databases, consumer behavior tracking). New algorithms for efficient processing of such data may therefore realize both economic and scientific impact. As compressing sensing is particularly applicable to imaging, the medical community stands to benefit from the results of this research through reduced image acquisition times and faster diagnoses from novel computation. The theoretical advances that will result from this proposal are applicable to other mathematical areas such as numerical analysis and computational harmonic analysis. The algorithms from this work will be implemented in high-quality code made publicly available to the general scientific community.
Massive data sets are produced by a wide range of scientific disciplines, commerce, and society. Accordingly, new computational methods are needed to process data sets of ever-increasing size. Fortunately, such data sets often exhibit a concise underlying structure that may be exploited to realize efficient processing algorithms. A signal or datum is considered to be sparse if it depends on fewer degrees of freedom than the dimension in which it is observed. Recent results from the compressed sensing (CS) community show that sparse data can be reconstructed (exactly or approximately) from far fewer measurements than traditionally required. Sparse, low rank, and other compressible data models have allowed for many important advances during the past decade. The major goal of this project is the development of new frameworks for efficient computational algorithms based on compressible data models. Many compressible data sets exhibit an underlying nonlinear structure that does not adhere to the "transform sparsity" model required for CS. Mathematically, such data sets are modeled as points from a high-dimensional space that are confined to a low-dimensional "surface" (i.e., a manifold). The intellectual merit of this project lies in the development of a framework by which randomized sampling schemes may be applied to data sets exhibiting complicated nonlinear properties that are nonetheless "sparse" or compressible. Broader impacts of the project will result from providing the scientific and industrial communities with new tools for processing massive data sets. During this project, methods for compressively sampling data under new sparsity models have been investigated. In particular, we have developed and improved theory for recovering the local linear structure (the local tangent space) of manifold-valued data. Recovering the low-dimensional linear structure from high-dimensional data provides a framework for adapting CS and randomized sampling methods to this more flexible, nonlinear, data model. Further, we have develop and implemented an algorithm for recovering the local origin of such piecewise linear models from noisy data points. This code is publicly available for download at www.danielkaslovsky.com/code. The results of this project will impact fields such as mathematics, engineering, and signal processing by generalizing and extending the use of CS beyond the standard transform sparsity model, allowing for application to wider classes of data. The impact of the overall project has the potential to greatly reduce the amount of storage and time needed for modern data acquisition and processing in many facets of society (e.g., inventory databases, consumer behavior tracking, and medical imaging and records).