Behind every complex system, be it physical, social, biological, or manmade, lies an intricate network that encodes the interactions between its components. Statistical learning over networks has the potential to unleash one's ability to reason about the behavior of such systems; to understand their innate structure; and, ultimately predict their evolution. In the era of 'data deluge,' fulfilling this promise has not moved closer, as formidable challenges remain. These include making effective predictions while relying on scarce training samples; providing easily explainable outcomes in a transparent way; dealing with unreliable data or malicious attempts to undermine the learning process; as well as managing to handle massive-scale networks that can change over time in a timely and resource-considerate fashion. Aspiring to address such challenges, this project pioneers a scalable, expressive, interpretable, and robust multi-purpose framework for learning over networks. The toolbox to be developed is expected to boost state-of-the-art in data science, network science, graph mining, and big data analytics. It should thus impact and effect technology transfer to a broad range of emerging fields, from computational biology and neuroscience to social-economic networks. On the educational front, the multidisciplinary nature of this research will provide engaging experiences for both undergraduate and graduate students, disseminate research findings, and cross-fertilize ideas from diverse communities.

The overarching approach in this project unifies learning over graphs under a principled framework of random walk based diffusions with the goal of markedly improving learning performance, while also ensuring scalability and reliability. The research consists of three intertwined thrusts dealing with: (T1) Adaptive diffusions for fast and effective learning over networks tuned to the task and the underlying network topology; (T2) Scalable diffusions dealing with massive and challenging networks; and (T3) Robust diffusions capable of learning from untrusted data. The novel approach in T1 capitalizes on the 'landing probabilities' of judiciously constructed random walks, and opens venues leveraging meta-information, as well as nonlinear diffusion models, in order to innovate a gamut of learning tasks over possibly dynamic graphs. The research under T2 aims at massive and challenging graphs where a prohibitively large landing probability space is necessary to ensure high prediction accuracy. Finally, T3 aspires to cope with sophisticated adversaries employing graph structure-aware approaches to infiltrate the network, and investigates lines of defense even in settings where most data are malicious. Analytical and experimental performance evaluation will assess the merits of the novel approaches relative to node embedding and graph convolutional neural network alternatives.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2019-10-01
Budget End
2022-09-30
Support Year
Fiscal Year
2019
Total Cost
$702,232
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Type
DUNS #
City
Minneapolis
State
MN
Country
United States
Zip Code
55455