Parallel programs are typically written using a style in which independent processes are expressed sequentially and use locally-expressed message passing as a means of data exchange as well as a means of inter-process synchronization. From the network relationships that are implied by messaging among processes arise implicit data structures and implicit algorithms - that is, global parallel data structures and global parallel algorithms for which there is no explicit global representation. Limited only to a local approach for creating parallel programs, achieving correctness and performance of large-scale parallel algorithms is rapidly moving beyond our reasoning abilities. To address this shortcoming, this project develops a declarative language approach for describing new high-level communication operations, a means for composing these operations with computations, and a means for expressing transformations for optimizing the resulting compositions.
This project's implementation plan is based on aggressive evolution of the technology that is currently most effective in large-scale parallel computing - namely, explicitly managed shared data. In the distributed-memory case this is achieved with message passing, but the same discipline can be applied in the shared memory case as well. This project's approach may enable higher levels of expressiveness and abstraction for data management, communication, and coordination. Moreover, the separation of concerns that is naturally imposed between communication and computation greatly simplifies the mental model and implementation effort for programmers. In order not to sacrifice performance because of this division, this scheme's composition and transformation mechanisms will allow communication and computation to be (automatically) combined in an optimized fashion so that high performance and high scalability are achieved. To facilitate adoption by practicing programmers of the paradigms and tools that are developed, integration with education is essential. Accordingly, this project will directly train undergraduate, graduate, and post-doctoral researchers and will develop supporting curricula and materials to train both students and practicing programmers. Close collaboration with large-scale real-world scientific applications will further increase the practical relevance of this work.
Many successful and important applications have been written based on the combination of the single program multiple data (SPMD) style using the message passing interface (MPI) for communication. However, even from its early days, MPI was intended to be a transition technology that would enable yet more effective parallel programming approaches. Today it is widely recognized that message passing as it is carried out today with MPI is too low-level, too complicated, and not well matched to shared-memory programming, nor to fine-grained highly-asynchronous execution models. Inevitable advances in high-performance computing systems have only exacerbated these shortcomings. To better understand these shortcomings, we started by analyzing communication patterns that are common in parallel applications. We found that many applications use a bulk-synchronous parallel model, in which programs are divided into supersteps, each consisting of separated computation and communication phases (often with explicit or implicit synchronization). Furthermore, we found that dynamic and sparse communication patterns are important in newer applications such as graph analysis. That is, communication patterns change over time based on the program’s input data, and each processor communicates with a small number of other processors. Finally, hardware trends are showing an increased reliance on special purpose computation accelerators. These observations informed the development of two new programming languages: Kanor for managing the communication in large scale distributed systems, and Harlan for managing the interaction between the main processor and accelerator in hybrid computers. Kanor provides a declarative language for specifying communication patterns in parallel programs. Kanor’s lightweight syntax allows programmers to compactly specify what data needs to move where, but the details of how best to execute the transfer is left to Kanor. This creates a clean separation between the communication and computation code which captures the essence of the communication without specify irrelevant details. This in turn gives the implementation more freedom to optimize the communication code, such as by overlapping communication with computation or by taking better advantage of intra-node shared memory. High performance computing is showing increased use of special purpose accelerators. One driving force behind this trend is the concern over energy efficiency. Often, a special purpose processor can perform better and use far less energy than a general purpose processor. Yet, these hybrid systems bring their own programming challenges. To alleviate these, we have designed and implemented Harlan, a language for high level programming of accelerators. Similarly to Kanor, Harlan provides lightweight syntax for indicating which portions of the program should run on the host processor and which portions should run on the accelerator, again providing a clean separation. This in turn implicitly specifies when data needs to be accessible to each processor, and Harlan uses this to automatically manage the communication between the host processor and the accelerator. Harlan allows developers to quickly write applications targeting accelerators with far less effort than existing approaches. In addition to the technical products of this project, we have trained numerous Ph.D. students and postdocs for careers in computer science research. Our results have been published in traditional academic venues (conferences and journals). In addition, both the technology press and special purpose industry publications have covered our language work.