How do we know if big data is big enough? As algorithms make more and more decisions from data, we also need these algorithms to assure us that the decisions were well-informed, i.e. that enough data went into them. The theory of homological sensor networks, a branch of topological data analysis, was originally created to test if a collection of sensors covers a domain of interest, but the same theory can test if a data set "covers" an underlying decision space. Homological methods can complement, extend, and even replace statistical methods to give confidence in the completeness of a data set. Because they are topological, they can give robust signatures or summaries of data that are invariant to a wide range of implicit or explicit transformations. This project aims to extend the theoretical and algorithmic foundations of the homological sensor networks to be applicable in data analysis. Broader impacts include strengthening connections between theoretical computer science (TCS) and applied algebraic topology, and widening the range of data analyses to which topological methods and tools apply.
The PI will train both undergraduate and graduate researchers by incorporating advanced concepts in combinatorial topology in undergraduate and graduate curricula. The PI will also educate the larger TCS and data analysis communities through expository videos and open source software.
The specific aim of the proposal is to extend guarantees on homological sensor networks to apply to non-smooth sets, k-coverage, and dynamic coverage. A second specific aim is to push these algorithmic results back into the theoretical foundations of the sampling theories that underlie data analysis problems, by extending the so-called Persistent Nerve Theorem and defining new classes of near-homeomorphisms to capture the realities of unknown transformations in data while still providing theoretical guarantees. The third specific aim is to develop algorithms that extract information from what was traditionally called "topological noise" as simple experiments reveal that although it doesn't carry topological information, it does carry useful geometric information that may be used for classification and inference.