Society is awash in complex data that is recorded by high-resolution sensors and through our traces of online activity. One of the key challenges in turning this data into actionable insight is finding hidden groups in these data. For instance, finding similar types of land based on its spectral composition, or finding groups of like-minded people in a social network. While there are a variety of methods to solve these problems that are known already, there are none that take advantage of the multi-dimensional features of the data in an integrated way. The work in this proposal will provide an entirely new type of data clustering, or grouping. The method exploits the rich features of these complex modern datasets. Having access to such a method will help society improve its understanding of the results of complex scientific experiments, produce new insights into the common patterns in social networks, and extract valuable information from large databases of sensor information. The methods developed will be general, and thus, they will have broad applicability that will enrich our capability to use data to benefit society.
Spectral clustering is a technique to identify cohesive groups in a database based on simple pairwise relationships between the items. In a social network, for example, people are connected based on pair-wise friendship relationships. This two-way, or two-dimensional, association between people does not capture the richness of real-world social interactions that often involve multiple people. These higher-order interactions among groups will be represented as tensors and hyper-graphs in this project. For instance, an interaction among three people corresponds to a hyper-edge, which will be represented by a three-dimensional tensor. Thus, this project will investigate novel formulations of the spectral clustering problem that work on multi-dimensional relationships represented with tensors. These new methods and their variations will be evaluated based on the insights that they provide into (i) hyper-spectral imaging data, where each pixel has a large number of frequencies measured; (ii) in social network data, where groups of interactions create higher-order relationships; and (iii) in complex simulation data, where parametric variation creates multi-dimensional data and relationships based on space, time and parameters. This project will also investigate fast algorithms to identify the clusters based on the new formulations of spectral clustering, as well as relations between these algorithms and emerging work in computational topology.
All of the software developed as part of this project will be released as open source software in order to maximize its impact. The investigator will organize tutorials on these new formulations at major conferences to attract additional applications.
For further information see the project web site at: www.cs.purdue.edu/homes/dgleich/tensors