This project exercises and expands upon methods for automatic discovery of new representations at multiple temporal and spatial scales. The specific framework generalizes classical harmonic analysis, in particular wavelet-based methods, to graphs and manifolds, thereby greatly extending the scope and the desirable characteristics of this multiscale-analysis framework to domains with arbitrary geometries. This framework, termed diffusion wavelets because it is associated with a diffusion process that defines the different scales, has unique properties relevant to learning, function approximation, compression and denoising. The set of core problems that this project addresses include fast algorithms for construction of multiscale diffusion wavelets, approximation of functions on very large graphs and high-dimensional manifolds, out-of-sample extensions of functions on manifolds and graphs, compression and denoising of functions on data sets, perturbation analysis, and randomized algorithms for multiscale analysis. Challenging application domains are being investigated, including analysis of document corpora, Markov decision processes, and 3D image rendering. In each case, multiscale diffusion analysis yields interpretable and meaningful results. For example, when applied to Markov decision processes, diffusion wavelet analysis yields new optimization methods that dynamically aggregate states and actions at multiple levels of abstraction; and when applied to 3D computer graphics, it yields new compression methods that capture geometric features of objects at multiple resolutions.

Project Report

We are literally drowning in 'big data', caused by an explosion in our ability to record, transmit, and store data (from sensor networks to documents on the world wide web). The analysis of data has become one of the most important computational problems of the 21st century. Conventional statistical methods were developed in the early part of the 20th century, and are not readily able to meet the challenge of high-dimensional data analysis in the 21st century. This grant explores new approaches to extracting multi-scale structure from data, building on recent developments in the analysis of digital data developed in mathematics. Specifically, the framework generalizes classical Fourier and wavelet analysis in Euclidean spaces to discrete domains and non-Euclidean spaces, such as graphs and manifolds, thereby enabling these powerful analytical methods to become much more widely applicable to the analysis of massive discrete digital data sets, such as social network analysis on Facebook and other groups, 3D computer graphics, information retrieval using a new generation of more intelligent search engines, and the solution of difficult optimization problems in decision making. The overall goals of this project were to investigate a new class of dimensionality reduction methods that work, not by constructing eigenvectors such as most popular methods such as principal components analysis, a widely used 100 year old method from statistics for constructing lower-dimensional representations, but by constructing a multi-scale analysis of the underlying space using the principles of wavelets on continuous spaces. These methods are called diffusion wavelets, because they perform a multi-scale analysis of random walks on a graph. Although wavelets in continuous spaces have been explored a lot, their discrete counterparts on a graph have been far less explored. The aims of the project were to investigate one specific class of multi-scale wavelets on graphs, understand their theoretical properties, investigate their applications in a variety of areas, and introduce them to the machine learning and AI communities. The major findings from this grant include the development of new ways of reducing the dimensionality of high-dimensional (text, image) data, so that structure can be found at a variety of levels. Additionally, new algorithms were developed to transfer knowledge between data that come from different sources (e.g. text documents in English and Arabic, where the surface level features (words) are entirely different; or two different proteins). These new developments extend the field of machine learning in new directions, building closer connections to classical ideas in harmonic analysis. ??Applications of these algorithms to a variety of large data sets was undertaken. One challenging data set involves a set of 100,000 documents from the European Union, which has documents in 12 different languages (e.g., English, Italian, German etc.). To build correspondences between documents in different languages, one must find mappings that align latent (hidden) concepts, since the surface features (words) are entirely different. The multi-scale manifold alignment framework provides a powerful way to find correspondences between structured objects, by automatically finding latent features. Another application involves the construction of compressed 3D object representations, an unsolved problem in computer graphics. 20 years from now, digital cameras may be capable of routinely taking high-resolution 3D digital images of the world around us. Conventional image compression methods, such as JPEG, do not extend to 3D representations. The multi-scale data analysis method explored in this project was applied to the problem of compressing 3D object representations, and was found to be more effective than previously proposed eigenvector methods. Another dataset that has been used in this research is a collection of 5000 NSF Research Award abstracts made available from the UC Irvine Repository. This analysis revealed groupings among the various topics proposed that were automatically discovered by the multi-scale analysis algorithm. The final application involved the solution of difficult optimization problems called Markov decision processes, which involve finding decisions that provide the maximum payoff in the long run for an autonomous agent acting in a simulated or real environment. Examples include game playing programs, robots operating in the world, simulators of factory processes and so on. Solving these problems requires finding value functions, which convert situations of the world into anticipated values, and enable the agent to decide which action to commit to. In large problems, value functions need to be approximated using a small number of basis functions. The multi-scale analysis framework was shown to provide an elegant method for automatically discovering basis functions for approximating value functions.

Project Start
Project End
Budget Start
2008-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2008
Total Cost
$365,407
Indirect Cost
Name
University of Massachusetts Amherst
Department
Type
DUNS #
City
Amherst
State
MA
Country
United States
Zip Code
01003