Comparing topological spaces, especially those arising from noisy data, is difficult. Topological data analysis (TDA) captures the 'shape' of data with descriptors (e.g., persistence diagrams and Reeb graphs). Individually, these topological descriptors have proven to be powerful data-analysis tools; however, a single topological descriptor is often not rich enough to capture the intricacies of large, complex data that arises in real applications. For that reason, this project develops a framework for studying topological spaces and data by transforming them into families of descriptors that capture the topology of different 'views' of the data. Studying this family of descriptors enables new methods for summarizing and comparing topological spaces, and creates a pathway for their use in statistical settings. As a result, this project will develop usable, theoretically-grounded data-analysis techniques that will enable TDA for large complex data, including networks, images, and point clouds. Through the research activities, the investigator will train graduate and undergraduate students in interdisciplinary research, and special efforts will be made to recruit first-generation and underrepresented minority students. The proposed educational activities will promote a sense of belonging of first-generation and underrepresented minority students as graduate students in STEM, and, more generally, as future academics in interdisciplinary domains.
More specifically, this project builds the foundations for representing and comparing large, complex topological spaces and data sets through parameterized families of topological descriptors, where a topological descriptor is any summary of a topological space. These families of topological descriptors are called topological transforms. The two objectives to accomplish this are: (1) Studying topological transforms and quantifying their ability to represent topological spaces. The project team will study existing transforms, reframe other topological and data analysis concepts as a topological transform, and propose new transforms. By allowing a diverse set of topological descriptors (from persistence diagrams to Euler characteristic curves to small graphs) and a choice of parameterization set (e.g., subspaces or ambient directions), the topological transform framework is flexible. (2) Developing the statistical tools necessary for using topological transforms for data summarization and comparison. In doing so, the project team will define distances between transforms and study the transforms as distributions over spaces of topological descriptors. Throughout, theoretical developments will be grounded in applications, thus establishing mathematical foundations and algorithmic developments that address core issues in analyzing data from real-world applications.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.