As complex high-dimensional data is generated at a large-scale across a wide variety of scientific fields, exploratory data analysis is crucial to gain a better understanding about the data generating process. Indeed, the primary step in the data analysis pipeline arguably is to use unsupervised machine learning methods that help the data analyst to effectively visualize and understand the data being analyzed. This project will concentrate on developing a deeper understanding of such methods so as to enable interpreting the outputs of such procedures better. The novel methodology developed in this project will be disseminated to the applied fields based on existing collaborations of the PIs. Implementations of the developed methodologies will be made available for use by the wider public via open-source packages. This project will also train graduate students and undergraduate students (from socio-economically disadvantaged backgrounds) for a successful career in statistical data science.

More specifically, the main goal of this project is to develop statistical and computational methods to extract low-dimensional geometric and topological structure available in high-dimensional datasets. The contributions of this project will hence lie at the intersection of statistical machine learning, and geometric and topological data analysis. The PIs will work both on developing a deeper understanding of existing methodology via a geometric lens, and on proposing novel methodology for unsupervised machine learning based on a topological lens. In the first part, the PIs will study the reason for the emergence of a certain geometric orthogonal cone structures when constructing low-dimension embeddings of high-dimensional data, with non-linear dimension reduction techniques like kernel principal component analysis, diffusion maps, non-local method like ISOMAP and Local Linear Embedding (LLE), and topological methods like Uniform Manifold Approximation and Projection (UMAP). In the second part, the PIs will develop and analyze novel dimension reduction techniques that preserve the topological information available in high-dimensional data. Finally, the PIs will examine the use of topological regularization techniques for regression and classification, from a theoretical and methodological perspective.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
2053918
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
2021-07-01
Budget End
2024-06-30
Support Year
Fiscal Year
2020
Total Cost
$106,764
Indirect Cost
Name
University of California Davis
Department
Type
DUNS #
City
Davis
State
CA
Country
United States
Zip Code
95618