With the advances in technology, Field-Programmable Gate Arrays (FPGAs) have become an attractive choice for scientific computing. Indeed, several research groups as well as vendors are developing high performance computer systems which employ FPGAs for application acceleration. These hybrid systems integrate general-purpose processors, FPGAs, memory hierarchy consisting of SRAM and DRAM, and pose new design challenges in optimizing the overall performance. The challenges to achieving high performance include managing shared memory hierarchy, partitioning among multiple FPGAs, and hardware/software co-design between the general-purpose processors and the FPGAs.
This research develops a high performance linear algebra library for FPGA-accelerated systems. The operations considered include reduction of a series of floating-point values, data path synthesis using deeply pipelined FPUs, sparse matrix-vector multiplication, and dense matrix computations. These kernels are fundamental operations in many scientific applications. The library is parametrized using available configurable logic, on-chip memory (Block RAM), SRAM and its bandwidth, and DRAM bandwidth via interconnection network. Algorithmic exploration of hybrid computing platforms that consist of processors, reconfigurable logic and user controlled memory hierarchy are performed. These include: 1. Optimal algorithms to exploit memory hierarchy and reconfigurable logic, 2. Parameterized IP cores based on the design space characterized by available logic, SRAM and memory bandwidth, 3. Hardware/software partitioning to exploit the computational resources, 4. Synthesis of optimal data paths for arithmetic expression evaluation including reduction circuits, and 5. Demonstration on state of the art high end computing platforms from leading supercomputing vendors and research groups.
Comparison against highly optimized code developed for general purpose processors using well-defined benchmarks are performed using comparable architectural resources such as processor-memory bandwidth, memory and logic.