Data analysis is a fundamental problem in computational science, ubiquitous in a broad range of application fields, from computer graphics to geographics information system, from sensor networks to social networks, and from economics to biological science. Two complementary fields that have driven modern data analysis are computational geometry and statistical learning. The former focuses on detailed and precise models characterizing low-dimensional geometric phenomena. The latter focuses on robust or predictive inference of models given noisy high-dimensional data. This project aims to initiate a dialog between these two fields with geometry being the central theme. A closer interaction between them will benefit and advance both fields, and can potentially fundamentally change the way we view and perform data analysis.

Specifically, on one hand, the type of data common in the learning community poses several challenges for traditional computational geometry methods. The shift of focus to these challenges and the modeling of uncertainty central in statistical learning can broaden the scope of computational geometry, and lead to geometric algorithms and models that are more robust to noise and extend to high-dimensional data analysis. On the other hand, computational geometry has developed many elegant structures that contain often detailed and precise information about the underlying domain. Models parameterized using these structures can lead to statistical learning models and algorithms that are richer and more interpretable but remain robust to noise and are predictive.

This project is multi-disciplinary in nature, and will involve fields including computational geometry, algorithms, statistics, differential geometry and topology. Education will be integrated in this project.

Project Report

Data analysis is a fundamental problem in computational science, ubiquitous in a broad range of application fields, from computer graphics to geographic information system, from sensor networks to social networks, and from economics to biological science. A key component of data analysis is to understand the geometric structure of the space these data sampled from. The field of computational geometry has traditionally been one of the main areas to study discrete geometric structures, focusing on their algorithmic and combinatorial aspects. In machine learning and statistics, inference of model parameters and extraction of information are central problems. Recently there has been renewed interest in using geometric structures to infer models and parameters (e.g., in manifold learning). Given that in both fields the geometric structure of discrete objects are of fundamental concern, it seems natural that strong connections between the algorithmic and mathematical tools in both fields should exist. However, so far, their focuses and approaches have been rather disjoint. This EAGER project investigated the connections between these disciplines, exploring the cross-section of the two fields with the common theme of "geometry", and studying how ideas from these two fields can augment each other. Specifically, the investigators focused on two main themes in this project. First, in both computational geometry and manifold learning, one often assumes that the input data are sampled from a hidden smooth manifold. Theoretical guarantees typically rely on such assumptions. A more realistic setting, yet still amenable to analysis, is that the input domain is a collection of manifolds with potential intersections and singularities. The project initiated work to study spaces more complex than manifolds. In particular, we focused on the so-called singular manifold, and provided important initial analysis of how an empirical operator widely used in data analysis applications, the Gaussian-weighted graph Laplacian, behaves on such singular manifolds. The resulting understanding helped us to develop algorithms to learn a collection of (potentially intersecting) surface patches from their point samples. This project advanced learning more complex geometric structures from both theoretical and practical points of view. Secondly, this project explored how to leverage the elegant topological structures and quantities developed in the field of computational geometry and topology for modern data analysis applications. To this end, we developed a new algorithm to learn a geometric graph structure from a set of point samples, which is robust, effective, and can be used for diverse inputs. We also used a simple topological quantity to help enrich the so-called Morse crystals, which are concise representations for high dimensional scalar fields. This project helped to consolidate the dialog between the fields of computational geometry and statistical learning , and made contributions to our understanding and manipulation of modern data. The PhD students supported by this project worked closely with the PIs and were trained in a multi-disciplinary environment that involves computational geometry, algorithms, machine learning, and statistics. Results from this project were published in top venues in both computational geometry and in machine learning.

Project Start
Project End
Budget Start
2010-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2010
Total Cost
$196,000
Indirect Cost
Name
Ohio State University
Department
Type
DUNS #
City
Columbus
State
OH
Country
United States
Zip Code
43210