Nowadays massive, high dimensional data sets arise in many fields of contemporary science and introduce new challenges. In machine learning, the well-known curse of dimensionality implies that, in order to achieve a fixed accuracy in prediction, a large number of training data is required. In image and signal recovery, a large number of measurements are needed to recover a high-dimensional vector, unless further assumptions are made. Fortunately, many real-world data sets exhibit low-dimensional geometric structures due to rich local regularities, global symmetries, repetitive patterns, or redundant sampling. The PI will explore low-dimensional geometric structures in data sets for feature extraction, data prediction and signal recovery. Dimension reduction and function approximation given a set of training data are of central interest in machine learning and data science. When data are concentrated near a low-dimensional set or the function has low complexity, the PI will develop new and fast machine learning algorithms whose performance depends on the complexity of the data or the function, instead of the dimension of the data sets. In image and signal recovery, an interesting problem is to recover a high-dimensional, sparse vector from a small number of structured measurements. This problem is challenging since sensing matrices arising from imaging and signal processing are often deterministic, structured and highly coherent (some columns are highly correlated), which does not allow one to apply standard theory and algorithms. The PI will utilize the structures of sensing matrices, develop efficient algorithms, and prove performance guarantees. The theory and fast algorithms developed in this project can be applied to a wide range of problems in data compression, image analysis, computer vision, and signal recovery.

High dimensional data arise in many fields of contemporary science and introduce new challenges. Fortunately, many real-world data sets exhibit low-dimensional geometric structures. This project focuses on exploiting these low-dimensional geometric structures of the data sets, and developing novel methods for dimension reduction, function approximation, and signal recovery. The PI will work on two sets of problems. In the first one, a data set is modeled as point clouds in a D-dimensional space but concentrating near a d-dimensional manifold, where d is much smaller than D. She plans to exploit the geometric structures of the data sets to build low-dimensional representations of data and approximate functions on data. Function approximations in Euclidean spaces have been well studied; however, classical estimators converge to the true function extremely slowly in high dimensions. When data are concentrated near a d-dimensional manifold, or the function has low complexity, the PI aims at constructing estimators that converge to the true function at a faster rate depending on the intrinsic dimension d. The proposed approach is based on the PI's recent work on adaptive geometric approximations for intrinsically low-dimensional data, where a data-driven, fast and robust scheme was developed to construct low-dimensional geometric approximations of data. The second set of problems arise from imaging and signal processing where the goal is to recover a high-dimensional, sparse vector from its noisy low-frequency Fourier coefficients. It is related with super-resolution in imaging, as the missing high-frequency Fourier coefficients correspond to the high-resolution components of the vector. Many existing methods fail since some columns in the sensing matrix are highly correlated. The PI will utilize the structure of the sensing matrix, develop efficient algorithms and prove performance guarantees. A mathematical theory will be developed to explain the fundamental difficulty of super-resolution, as well as the resolution limit of superior subspace methods, such as MUSIC, ESPRIT, and the matrix pencil method.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1818751
Program Officer
Yuliya Gorb
Project Start
Project End
Budget Start
2018-06-15
Budget End
2021-05-31
Support Year
Fiscal Year
2018
Total Cost
$215,385
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332