Until recently, the speed of a computer processors could be increased by packing more transistors into a smaller area and increasing the frequency. This trend is now unsustainable because of power constraints, with only moderate gains going forward even when putting hundreds or thousands of traditional cores onto a chip. Thus, how to reduce power consumption while increasing performance is one of the core concerns. It is well-accepted that specialization (designing parts of the processor for a specific task) and heterogeneity (designating different parts of the processor for different tasks) can lead to orders of magnitude improvements in both aspects. However, the question is whether such efficiency can be maintained while providing enough flexibility to implement a broad class of operations. Leveraging unique domain expertise, research under this project addresses this question for the domain of matrix computations, which are at the core of many computational advances, both in scientific high-performance computing as well as in the embedded, mobile or cyber-physical domains.
Observing that the largest benefits can be obtained through specialization at the foundations, this project is aimed at co-designing algorithms and architectures to directly realize basic linear algebra methods in an optimized combination of hardware and software. By designing a specialized Linear Algebra Processor (LAP), it is possible to achieve one to two orders of magnitude improved efficiencies compared to traditional or proposed computer architectures. The questions that the project will answer, through a combination of analysis, simulation, and prototyping, include: (1) How to best design such LAPs that can efficiently execute the full set of linear algebra routines; and (2) How LAPs can be scaled, networked into clusters and integrated with application software running on one or more host processors. The broad goal of this project is to develop novel, integrated linear algebra compute fabrics that are co-optimized and co-designed across all layers ranging from the basic hardware foundations all the way to the application programming support through standard linear algebra software packages. This project is expected to result in a leap in computational science and discovery capabilities, thus enabling novel breakthroughs in industry, for the consumer, at the national labs, in education and by scientists in academia.