This work brings together methods of differential geometry and statistics to construct learning algorithms for high dimensional data sets. At the heart of our approach is the assumption that although natural data is embedded in high dimensional spaces, they lie on a low dimensional manifold embedded in this space. The dimension of this manifold is related to the number of degrees of freedom of the data generating process. Consequently, the proper domain on which classifiers should act is this low dimensional manifold. Using the Laplace Beltrami operator, an appropriate basis of functions can be invariantly and intrinsically defined on this manifold. The classifier is then expressed in terms of this basis and the coefficients are estimated from training examples. If the manifold is unknown, we construct a graph approximation from discretely sampled data points, replace the Laplace Beltrami with the graph Laplacian and perform analogous computations on the graph.

In general, we plan to explore the manifold structure of natural data, apply our geometrically motivated learning techniques to practical applications such as partially labeled classification and data representation, and consider theoretical questions of convergence of our sample estimates to the desired true values. In an age where massive high dimensional data sets are being collected in domains ranging from bio-informatics to the Internet, the techniques proposed here will provide new ways to represent, classify, and access such data.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0310643
Program Officer
Robert B Grafton
Project Start
Project End
Budget Start
2003-06-15
Budget End
2005-05-31
Support Year
Fiscal Year
2003
Total Cost
$97,585
Indirect Cost
Name
University of Chicago
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60637