Most of the traditional High-End Computing (HEC) applications and current petascale applications are written using the Message Passing Interface (MPI) programming model. Some of these applications are run in MPI+OpenMP mode. However, it can be very difficult to use MPI or MPI+OpenMP and maintain performance for applications which demonstrate irregular and dynamic communication patterns. The Partitioned Global Address Space (PGAS) programming model presents a flexible way for these applications to express parallelism. Accelerators introduce additional programming models: CUDA, OpenCL or OpenACC. Thus, the emerging heterogeneous architectures require support for various hybrid programming models: MPI+OpenMP, MPI+PGAS, and MPI+PGAS+OpenMP with extended APIs for multiple levels of parallelism. Unfortunately, there is no unified runtime which delivers the best performance and scalability for all of these hybrid programming models for a range of applications on current and next-generation HEC systems. This leads to the following broad challenge: "Can a unified runtime for hybrid programming model be designed which can provide benefits that are greater than the sum of its parts?"
A synergistic and comprehensive research plan, involving computer scientists from The Ohio State University (OSU) and Ohio Supercomputer Center (OSC) and computational scientists from the Texas Advanced Computing Center (TACC) and San Diego Supercomputer Center (SDSC), University of California San Diego (UCSD), is proposed to address the above broad challenge with innovative solutions. The investigators will specifically address the following challenges: 1) What are the requirements and limitations of using hybrid programming models for a set of petascale applications? 2) What features and mechanisms are needed in a unified runtime? 3) How can the unified runtime and associated extension to programming model APIs be designed and implemented? 4) How can candidate petascale applications be redesigned to take advantage of proposed unified runtime? and 5) What kind of benefits (in terms of performance, scalability and productivity) can be achieved by the proposed approach? The research will be driven by a set of applications from established NSF computational science researchers running large scale simulations on Ranger and other systems at OSC, SDSC and OSU. The proposed designs will be integrated into the open-source MVAPICH2 library. The established national-scale training and outreach programs at TACC, SDSC and OSC will be used to disseminate the results of this research.