Many high-end scientific applications perform stencil computations in their inner loops. A stencil defines the value of a grid point in a d-dimensional spatial grid at time t as a function of neighboring grid points at recent times before t. Stencil computations are conceptually simple to implement using nested loops, but looping implementations suffer from poor cache performance on multicore processors. Cache-oblivious divide-and-conquer stencil codes can achieve an order of magnitude improvement in cache efficiency over looping implementations, but most programmers find it difficult to write cache-oblivious stencil codes. Moreover, open problems remain in adapting these algorithms to realistic applications that lack the perfect regularity of simple examples. This project's investigation of cache-oblivious stencil compilation enables ordinary programmers of stencil computations to enjoy the benefits of multicore technology without requiring them to write code any more complex than naive nested loops.
The research project is developing a language embedded in C++ that can express stencil computations concisely and can be compiled automatically into highly efficient algorithmic code for multicore processors and other platforms. The Pochoir stencil compiler compiles stencil computations that exhibit complex boundary conditions, such as periodic, constant, Dirichlet, Neumann, mirrored, and phase factors; irregularities, including macroscopic and microscopic inhomogeneities, as well as irregular shapes; general complex dependencies, such as push dependencies, horizontal dependencies, and dynamic dependencies. To achieve these goals, the researchers are developing provably good algorithms for complex stencil computations; exploring how domain-specific compiler technology can achieve speedups from efficient cache management, processor-pipeline scheduling, and parallel computation; investigating how to run stencils efficiently on a wide variety of architectures such as multicore, distributed-memory clusters, graphical processing units, FPGA's, and future exascale machines; demonstrating the effectiveness of their research by developing a production-quality stencil compiler; developing a benchmark suite and benchmarking system for evaluating Pochoir.
This research enables scientific researchers and others to easily produce highly efficient codes for complex stencil computations. The codes make good use of the memory hierarchy and processor pipelines endemic to multicore processors and run fast on a diverse set of hardware platforms. This research eases the development and maintenance of a wide variety of stencil-based applications, ranging across physics, biology, chemistry, energy, climate, mechanical and electrical engineering, finance, and other areas, benefiting these application areas, as well as society at large.
Many high-end scientific applications perform stencil computations intheir inner loops. A stencil defines the value of a grid point in ad-dimensional spatial grid at time t as a function of neighboring gridpoints at recent times before~t. Stencil computations areconceptually simple to implement using nested loops, but loopingimplementations suffer from poor cache performance on multicoreprocessors. Cache-oblivious divide-and-conquer stencil codes canachieve an order of magnitude improvement in cache efficiency overlooping implementations, but most programmers find it difficult towrite cache-oblivious stencil codes. This project enables ordinaryprogrammers of stencil computations to enjoy the benefits of multicoretechnology without requiring them to write code any more complex thannaive nested loops. This research developed a language embedded in C++ that can expressstencil computations concisely and can be compiled automatically intohighly efficient algorithmic code for multicore processors and otherplatforms. The Pochoir stencil compiler compiles stencilcomputations that exhibit * complex boundary conditions, such as periodic, constant, Dirichlet, Neumann, mirrored, and phase factors; * irregularities, including macroscopic and microscopic inhomogeneities, as well as irregular shapes; To achieve these goals, the researchers * developed provably good algorithms for complex stencil computations; * explored how domain-specific compiler technology can achieve speedups from efficient cache management, processor-pipeline scheduling, chromatic scheduling, and parallel computation. * investigated how to run stencils efficiently on a wide variety of architectures such as multicore, distributed-memory clusters, graphical processing units, FPGA's, and future exascale machines; and * demonstrated the effectiveness of their research by developing a production-quality stencil compiler. Intellectual merit: Real stencil applications oftenexhibit complex irregularities and dependencies, which makes itdifficult for programmers to produce efficient multicore code for themor to migrate them to other modern hardware platforms. Even simplestencils are hard to code for performance. This research attacked the difficult problem of generating high-efficiencycache-oblivious code for stencil computations that make good use ofthe memory hierarchy and processor pipelines, starting withsimple-to-write linguistic specifications. This effort requiredcross-domain technical expertise, including an understanding ofmulticore programming, strong theoretical skills to develop efficientparallel algorithms and data structures, systems experience to buildand tune a compiler and runtime system, knowledge of real applicationsthis technology will benefit, and an aesthetics for language design. Broad impact: This research enables scientific researchers and othersto easily produce highly efficient codes for complex stencilcomputations. The codes make good use of the memory hierarchyand processor pipelines endemic to multicore processors and will runfast on a diverse set of hardware platforms. A wide variety ofstencil-based applications --- ranging across physics, biology,chemistry, energy, climate, mechanical and electrical engineering,finance, and other areas --- will become easier to develop andmaintain, benefiting these application areas, as well as society atlarge.