This project studies machine learning from data that appears high dimensional but, in fact, has low intrinsic dimension (e.g., the data lies on a low-dimensional manifold). Physical constraints in many applications produce exactly such a situation. The project is developing machine learning systems that use resources (e.g., compuational time and space) that scale with the intrinsic rather than the extrinsic dimension. The idea of data lying on a manifold is appealing and suggestive, and has been the inspiration of a lot of recent, exciting work in machine learning. Often the aim is to embed such data into a lower dimensional space, after which the application of standard methods consume less resources. The PIs have developed a precise notion of intrinsic dimension that captures the manifold intuition while being broad enough to both be statistically sensible and empirically verifiable. This quantity is then treated as a fundamental parameter in terms of which a variety of new nonparametric methods can be assessed. The first of these is a simple variant of the k-d tree that is provably adaptive to intrinsic dimension. The PIs also consider schemes for nonparametric classification and regression, for manifold learning, and for embedding. These new algorithms and ideas will be applied to fundamental challenges in a variety of domains, including sensor networks, computer vision, protein structure prediction, and robotic control.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0812598
Program Officer
Richard Voyles
Project Start
Project End
Budget Start
2008-08-01
Budget End
2012-07-31
Support Year
Fiscal Year
2008
Total Cost
$450,000
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093