High-end distributed and distributed shared memory platforms with millions of processors will be deployed in the near future to solve the toughest technical problems. Their individual nodes will be heterogeneous multithreading, multicore systems, capable of executing many threads of control, but with relatively little memory per thread, low bandwidth to main memory and deep memory hierarchies. A programming model that supports productive, portable, efficient parallel programming both within and across the nodes of these petascale systems is essential if their potential is to be realized. Since it is easier for application developers and tool vendors to extend existing software rather than adopt a new programming language, a programming model based upon a familiar paradigm is highly desirable.
OpenMP is a widely supported shared memory programming model that provides ease of maintenance. It is suitable for programming multicore nodes, but does not address the needs of distributed memory platforms. This research will significantly extend OpenMP so that it can be used to program all levels of a high-end petascale system. In order to accomplish this, the investigators will enhance its existing mechanisms for describing multiple levels of parallelism, provide additional features for specifying synchronization and for achieving high levels of locality, as well as develop a novel I/O interface. Moreover, they will substantially improve the state of the art in OpenMP implementation technology, enabling high performance between nodes as well as within them. Results will be demonstrated via a state-of-the-art Fortran/C/C++ OpenMP compiler, a highly optimized communications library, and a range of large scale applications.