The investigator and his colleagues use spectral methods to investigate a challenging problem in unsupervised learning and data visualization: given a set of unorganized data points sampled possibly with noise from an underlying manifold, compute a set of global coordinates of the data points with respect to the underlying manifold. The approach is guided by the general principle that global structures can emerge from careful analysis of local interactions. The central idea is the exploration of tangent spaces as representations of local geometry on a nonlinear manifold using weighted PCA, and the global alignment of those local tangent spaces to obtain the global structure of the manifold by way of computing a partial eigendecomposition of the neighborhood connection matrix. The study focuses on the following areas: 1) effective local geometric and topological structures for manifold learning; 2) the interaction of the local sampling density, noise level, the regularity of the manifold and the local curvature structure, and their effects on the accuracy of manifold learning; 3) local smoothing methods to make manifold learning more robust to noise and outliers; 4) connections with infinite mixture models, one-class support vector machines and level set methods; 5) efficient and scalable algorithms for large-scale manifold learning problems. With the advancement in modern computing technology, we are faced with the challenging problems of extracting useful information from vast amounts of data. The data generated in a variety of applications tend to have tens of thousands of attributes, leading to the problem characterized by "curse of dimensionality." In many applications, however, high-dimensional data points are governed by a few intrinsic degrees of freedom. Think of the set of images depicting a horizontally rotating face: the images are high-dimensional but the intrinsic degree of freedom of the image set is simply one, representing the rotation angle of the face. The focus of this proposal is the development of geometric, statistical, and computational methods for extracting those latent intrinsic degrees of freedom by modeling the set of data points as samples from nonlinear manifolds, and the discovery of the global structure of the manifolds from careful analysis of local interactions. The results, algorithms, and techniques developed have immediate applications in bioinformatics, especially for gene expression analysis for detecting and distinguishing diseases and disease types; in appearance-based modeling for people detection using video sequences generated from surveillance cameras; in helping intelligence analysts to sift through large amount of textual information by more efficient modeling of text document collections; and in enhancing computer-supported collaborative learning and performance measurement in second language learning.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0311800
Program Officer
Junping Wang
Project Start
Project End
Budget Start
2003-09-01
Budget End
2007-01-31
Support Year
Fiscal Year
2003
Total Cost
$261,187
Indirect Cost
Name
Pennsylvania State University
Department
Type
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802