Many emerging high-performance applications from various domains of national importance (e.g., astrophysics, computational chemistry, drug discovery, and nuclear physics) employ large graph structures (with millions of nodes and edges connecting them) to represent their data, and perform different types of analytics on them, with the goal of understanding and extracting useful information from this massive data. While this graph-based analytics is fast becoming a fundamental piece of high-performance computing, existing analytics methods on graphs do not scale with increased hardware resources, mostly employ conventional machine learning/graph analysis strategies, and do not take full advantage of emerging heterogeneous compute and storage elements. As a result, the time-to-insight from large-scale graph-based data increases significantly, thereby slowing down scientific discoveries. Motivated by this observation, this NSF-funded project explores, from a holistic viewpoint, Graph Neural Networks (GNNs), a type of Neural Network which directly operates on graph structures, as a main tool to optimize various high-performance applications that benefit from machine learning and data analytics. This project has the ultimate goal of making the transitioning from existing machine learning mechanisms to GNNs smooth and effective in a variety of application domains. By facilitating more efficient and cost-effective use of hardware resources that are provided by custom clusters, supercomputers and cloud systems, this project is also expected to reduce the barrier to entry to the GNN world for a broad population of researchers, practitioners, and machine learning companies. The educational and outreach components of this research include 1) undergraduate student involvement via vertically integrated research projects; 2) a new graduate course on GNNs; 3) participation of Science-U program (a summer science camp for K-12) at Penn State, and 4) summer workshops for high school girls and high school teachers.

More specifically, this project: 1) explores the theoretical foundations of GNNs with the goal of identifying the roots of convergence and scalability problems, which are critical to address when employing them in high-performance applications; 2) investigates an architecture-agnostic programming language support for GNN computations, focusing in particular on developer productivity, language expressiveness, and ease of extensibility (to ensure that it inter-operates with existing programming paradigms, models ,and tools); 3) explores compiler support for automatically optimizing and mapping GNN applications onto emerging hardware platforms (including multicore CPUs, GPUs, and FPGAs as well as their ensembles); 4) develops custom architecture support for GNN computations, with the goal of exceeding the performance of current programmable hardware options; 5) carries out an end-to-end experimental evaluation of GNN-based applications to identify the aspects that require further attention; and finally 6) develops a GNN-based benchmark suite that can be used on a wide variety of hardware platforms. These six components collectively form a multi-layer ecosystem tuned to understand and optimize high-performance, large-scale applications that can benefit from graph-driven learning.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
2008398
Program Officer
Almadena Chtchelkanova
Project Start
Project End
Budget Start
2020-10-01
Budget End
2023-09-30
Support Year
Fiscal Year
2020
Total Cost
$500,000
Indirect Cost
Name
Pennsylvania State University
Department
Type
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802