Data analysis can be described as the dual process of extracting information from observations, and of understanding patterns in a principled manner. This process and the deployment of data-centric technologies have recently brought unprecedented advances in many scientific fields, as well as increased global prosperity with the advent of knowledge-based economies and systems. At a high level, this revolution is driven by two thrusts: the modern technologies which allow for the collection of complex data sets, and the theories and algorithms we use to make sense of them. That said, and for all its benefits, extracting actionable knowledge from data is difficult. Observations gathered in uncontrolled environments are often high-dimensional, complex and noisy; and even when controlled experiments are used, the intricate systems that underlie them --- like those from meteorology, chemistry, medicine and biology --- can yield data sets with highly nontrivial underlying topology. This refers to properties such as the number of disconnected pieces (i.e., clusters), the existence of holes or the orientability of the data space. The research funded through this CAREER award will leverage ideas from algebraic topology to address data science questions like visualization and representation of complex data sets, as well as the challenges posed by nontrivial topology when designing learning systems for prediction and classification. This work will be integrated into the educational program of the PI through the creation of an online TDA (Topological Data Analysis) academy, with the dual purpose of lowering the barrier of entry into the field for data scientists and academics, as well as increasing the representation of underserved communities in the field of computational mathematics. The project provides research training opportunities for graduate students.

Understanding the set of maps between topological spaces has led to rich and sophisticated mathematics, for it subsumes algebraic invariants like homotopy groups and generalized (co)homology theories. And while several data science questions are discrete versions of mapping space problems --- including nonlinear dimensionality reduction and supervised learning --- the corresponding theoretical and algorithm treatment is currently lacking. This CAREER award will contribute towards remedying this situation. The research program articulated here seeks to launch a novel research program addressing the theory and algorithms of how the underlying topology of a data set can be leveraged for data modeling (e.g., in dimensionality reduction) as well as when learning maps between complex data spaces (e.g., in supervised learning). This work will yield methodologies for the computation of topology-aware and robust multiscale coordinatizations for data via classifying spaces, a computational theory of topological obstructions to the robust extension of maps between data sets, as well as the introduction of modern deep learning paradigms in order to learn maps between non-Euclidean data sets.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1943758
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
2020-05-01
Budget End
2025-04-30
Support Year
Fiscal Year
2019
Total Cost
$76,700
Indirect Cost
Name
Michigan State University
Department
Type
DUNS #
City
East Lansing
State
MI
Country
United States
Zip Code
48824