Complex networks appear in a wide spectrum of fields including bioinformatics and neuroscience. They are governed by intricate webs of interactions among constituent elements. Entities and relationships in graphs are increasingly being annotated with content, thus giving rise to rich attributed graphs. Effective integration of attribute data with network topology will enable scientists to ask new questions and discover novel findings that are not possible using only one source of data. This project is developing methods to find interesting patterns in networks that represent multiple types of information, e.g., gene expression and protein-protein interaction networks along with disease information, or co-authorship graphs with information on both the conference and year of publication. As an example of the value of such pattern discovery, protein-protein interaction subnetworks whose genes are dysregulated in multiple disease genome-wide expression datasets are useful for disease diagnosis and mechanism understanding.
The proposed research aims to develop a suite of algorithmic and analytic methods for analyzing large multi-relation and attributed graphs. Specifically, the project is: (i) developing novel concepts and algorithms for mining cross-graph interesting patterns from multi-relation graphs, and coherent patterns from attributed graphs, and (ii) designing computational methods for integrating biological data (e.g., gene expression, protein-protein interaction networks) for discovering coherent and phenotype-specific subnetworks. This poses interesting challenges due to the computational complexity of many subgraph discovery tasks; this project is using application (e.g., biologically) motivated definitions as a starting point to identify interesting pattern definitions for which discovery is tractable. In addition to dissemination to the research community, outcomes will include software tools for use by the broader scientific community, developing new courses in the area of network analysis, educational material for introducing high school students to computational thinking, and increasing minority undergraduate students involvement in research.