In recent years, the advent of new sensors, measurement technologies, and social network infrastructure has provided huge opportunities to visualize complicated interconnected network structures and record data of interest at various locations in such networks. Consequently, there is an explosion of interest and demand to analyze such data and make inferences, predictions, and diagnostics. Examples of such data include, but are not limited to: biology and medicine (e.g., blood flow rates in a network of blood vessels); computer and social sciences (e.g., information flows in social networks); electrical engineering (e.g., sensor networks); hydrology and geology (e.g., river flow measurements in a ramified river network); and civil engineering (e.g., traffic flow on a road network). The investigator and his team will develop mathematical and computational tools referred to as "multiscale basis dictionaries" on a given graph, which will have a positive impact in solving practical data analysis problems on graphs and networks in diverse fields as listed above. In particular, these dictionaries will be able to capture subtle features discriminating anomalous events from normal events on graphs, which may shed light on underlying causes of such anomalies. Students engaged in this project will be trained to be the next generation of interdisciplinary scientists who have deep knowledge in one area yet have open mind to the other areas and try to actively seek collaborations with domain experts. Such an attitude and a perspective will be indispensable for their future career, either in academia or in industry.
The goal of this project is to develop above-mentioned multiscale basis dictionaries and best bases selected from such dictionaries for graphs and networks, and demonstrate the usefulness by examining their performance on a variety of data analysis tasks on graphs and networks such as compression, denoising, semi-supervised learning, and anomaly detection. Mathematical and computational tools for analyzing such datasets, particularly for those on directed graphs, have not been well developed. For more conventional data supported on simple Euclidean domains and data sampled on regular lattices, harmonic analysis tools such as Fourier and wavelet transforms as well as multiscale basis dictionaries, e.g., wavelet packets and local trigonometric transforms, have a proven track record of success. This project can be viewed as the continuing effort of the investigator to transfer and extend these computational harmonic analysis tools from the realm of regular lattices and simple Euclidean domains to more general graph domains. The multiscale basis dictionaries for graphs including a complete Haar-Walsh basis dictionary will certainly enrich the current collection of data analysis tools on such domains because these dictionaries contain a huge number of possible bases from which one can quickly select a basis most suitable for a given task via the best-basis selection algorithm. In particular, any addition of mathematical and computational tools for data analysis on directed graphs is well rewarded since there are comparably few tools available despite their practical importance. This is partly because so many classes of directed graphs exist, and consequently, there has been confusion over the definitions of graph Laplacian matrices. Instead, this project provides a new viewpoint: on a directed graph, the connectivity between any two vertices are not a local concept; rather it is a global concept. Finding a shortest path connecting a given pair of vertices provides critical information on a directed graph. To utilize such information fully, spectral analysis of the distance matrices and the associated integral operators on a directed graph is performed using the singular value decomposition instead of analyzing the graph Laplacians using the eigendecomposition.