Graphs are ubiquitous, and often the fundamental data structure in many applications including bioinformatics, chemistry, healthcare, social networks, recommender systems and systems analysis. Machine learning (ML) using graphs is receiving increasing attention, both where graphs are a representation of data, as in graph neural networks (GNN) algorithms, and where graphs are an efficient ML model representation, as in arithmetic circuits representation of probabilistic graphical models. While useful, graph-based ML poses unique challenges to existing computation hardware (Central Processing Units and Graphics Processing Units) due to the combination of irregular memory access and dynamic parallelism imposed by the graph structure and the dense computation required for relevant learning algorithms, though hardware-based implementations are highly desirable to enable real-time processing of streams of data generated by such applications. The project addresses these challenges with a novel accelerator architecture for graph-based ML, along with a supporting open source software stack, simulator, and field-programmable gate-array (FPGA) prototype. Beyond the technical contributions, the project will integrate the latest research into several graduate and upper-division undergraduate courses. The project will also work with the UCLA Center for Excellence in Engineering and Diversity (CEED) and Women in Engineering to recruit highly diversified undergraduate and graduate students to participate in the research.

The project targets a programmable and heterogeneous multi-accelerator architecture, with software-controlled compute and memory resources. It is specialized in the following ways to meet the needs of graph-based machine learning. First, it supports composing accelerator engines for efficient pipelining of graph-based prefetching with dense computation units. Second, the prefetching hardware will be co-designed with GNN algorithms to support recent and upcoming advances in graph sampling and graph-coarsening algorithms. Third, it will include a high bandwidth scratchpad architecture optimized for indirect access, and spatial compute fabrics (e.g. systolic arrays) optimized for dense computation. Finally, the execution model will be based on an architecture-aware task-parallel model, which has rich-enough primitives to take advantage of heterogeneous hardware, while being flexible enough to load balance for dynamic parallelism. The key components of the proposed architecture will be prototyped on an FPGA. Overall, the goal of the work is to greatly advance the state-of-the-art of graph-based ML in terms of model accuracy, efficiency, and real-time inference and learning. The project will also collaborate with a synergistic DARPA program for related hardware development.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2019-10-01
Budget End
2022-09-30
Support Year
Fiscal Year
2019
Total Cost
$1,499,997
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095