Sparse direct methods form the backbone of many applications in computational science, but the methods are not keeping pace with advances in heterogeneous computing architectures. High end systems can be built to contain multiple general-purpose CPU cores, coupled with one or more Graphics Processing Units (GPUs) each with hundreds of simple yet fast computational cores. This project develops high-performance parallel sparse direct methods that can exploit GPU-based architectures to achieve orders of magnitude gains in computational performance. The focus is single and multiple GPU algorithms for multifrontal sparse QR factorization. QR factorization has wide applicability, is numerically very stable and is useful in many application areas. The nonuniform and hierarchical structure of sparse QR factorization along with the unique features of the GPU requires the development of novel algorithms. These include managing the simultaneous mix of regular computations inside the frontal matrix, and irregular computations in the assembly process between nodes in the computational tree and between concurrent subtrees.
An efficient sparse QR factorization is an essential kernel in many problems in computational science. It can be used to find solutions to sparse linear systems, sparse linear least squares problems, eigenvalue problems, rank and null-space determination, and many other mathematical problems in numerical linear algebra. Application areas that can exploit the result of this research include structural engineering, computational fluid dynamics, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, optimization, circuit simulation, economic and financial modeling, chemical process simulation, text/document networks, and many other areas. QR factorization is representative of many other sparse direct methods, with both irregular coarse-grain parallelism and regular fine-grain parallelism, and methodologies developed are very relevant for these other methods. The work has broad impact on computational linear algebra, optimization, and related application areas. The PI's research extends beyond these specific applications of numerical linear algebra, demonstrating how problems with a mixture of irregular and regular computation can be performed on the challenging yet promising landscape of GPU computing, and opens the door to many other kinds of applications. The investigator and his colleagues plan on producing and distributing high-quality software as a result of this work, for which they have a 20-year track record.