This work brings together methods of differential geometry and statistics to construct learning algorithms for high dimensional data sets. At the heart of our approach is the assumption that although natural data is embedded in high dimensional spaces, they lie on a low dimensional manifold embedded in this space. The dimension of this manifold is related to the number of degrees of freedom of the data generating process. Consequently, the proper domain on which classifiers should act is this low dimensional manifold. Using the Laplace Beltrami operator, an appropriate basis of functions can be invariantly and intrinsically defined on this manifold. The classifier is then expressed in terms of this basis and the coefficients are estimated from training examples. If the manifold is unknown, we construct a graph approximation from discretely sampled data points, replace the Laplace Beltrami with the graph Laplacian and perform analogous computations on the graph.
In general, we plan to explore the manifold structure of natural data, apply our geometrically motivated learning techniques to practical applications such as partially labeled classification and data representation, and consider theoretical questions of convergence of our sample estimates to the desired true values. In an age where massive high dimensional data sets are being collected in domains ranging from bio-informatics to the Internet, the techniques proposed here will provide new ways to represent, classify, and access such data.