CIF:Small:Wavelets on Graphs -- Theory and Applications -- 1018977 PI: Antonio Ortega (University of Southern California)

A key recent trend has been towards dramatic increases in the amounts of data that can be gathered for analysis. Examples include online social networks, online search logs, DNA analysis, surveillance, among many others. A major challenge is to extract useful information from these large data-sets, to the point that there is a risk that much of these data could be underutilized. This research aims at developing innovative data representation tools to enable significantly faster and more accurate analysis of these emerging data-sets.

This research is motivated by two observations: i) data points in these emerging data-sets can often be seen as part of a large graph and ii) tools for analysis of data on graphs tend to be global in nature, making it difficult to identify trends that manifest themselves in relatively small regions of the graph. Inspired by wavelet techniques developed over the past 20 years, this work studies a new class of wavelets that are defined on graphs. This project studies the underlying theory for these wavelets on graphs, including the design of localized, invertible and critically sampled transforms. The team is also addressing two concrete applications to illustrate the potential benefits of these methods. In one application these new tools are applied to Genomic data sets (where graphs correspond to genetic pathways) and to analysis of data in social networks. The second class of applications consists of applying these new transforms to the development of new tools for image and video processing.

Project Report

Signal processing methods are behind some of today's most widely used technology products, including digital cameras, digital television or cellphones. In these, signals such as images, speech or video are broken down into elementary components (or frequencies) prior to processing. These representations in terms of elementary frequencies are essential to achieve more efficient compression (which translates into lower storage or bandwidth, e.g., more images or videos can be stored or transmitted without additional storage or bandwidth resources). In these conventional signals data is represented by a series of values, or samples, that are regularly spaced. For example, pixels in a digital camera are arranged in a regular grid, similar to a chessboard, such that each row has the same number of pixels. This project has focused on developing processing methods for new datasets that no longer have the regular structure (e.g., grid structure) that conventional signal processing is based on. Examples of these datasets include information available in online social networks, such as Facebook, or in physical networks, such as the Internet. These networks can be viewed as graphs, where each vertex is connected to an arbitrary number of other vertices (for example, each person in an online social network may have a different number of connections). This project has focused on extending to these graphs the signal processing methods that have been so successful for images or video. As part of this work we have developed new approaches to extract elementary frequency components from graph signals. We have also proposed new approaches to "sample" these graph signals, answering the following question: if we can only observe and measure a small number of nodes in a graph/network, what would be the best nodes to observe? Beyond theory, this research has potential for impact in several areas. It will make it possible to process and analyze arbitrarily structured "Big Data" such as is being increasingly captured by sensors in applications including smart grid, Internet optimization or data mining. Our work has also been shown to be useful for learning applications, examples of which include speech recognition, face recognition or computer vision. Finally, it has also been shown that the proposed methods can lead to improvements in processing even for existing signals (e.g., images or video).

Project Start
Project End
Budget Start
2010-08-01
Budget End
2014-07-31
Support Year
Fiscal Year
2010
Total Cost
$500,000
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089