High-performance computing has reshaped science and industry in many areas. However, the rapid evolution at the hardware level over the last few years have been unmatched by corresponding changes at the programming paradigm level. According to the consensus of several major studies, the degree of parallelism on large systems is expected to increase by several orders of magnitude. As a result, the Message Passing Interface (MPI), which has been the de-facto standard message passing paradigm, lacks an efficient and portable way of handling today's architectures. To efficiently handle such systems, MPI implementations must adopt more asynchronous and thread-friendly behaviors to perform better than they do on today?s systems. Maintaining and further enhancing MPI, one of the most widely-used communication libraries in high-performance computing, will have a far-reaching impact beyond the scientific community, and represents a critical building block for continued advances in all areas of science and engineering. The ADAPT project enhances, hardens and modernizes the Open MPI library in the context of this ongoing revolution in processor architecture and system design. It creates a viable foundation for a new generation of Open MPI components, enabling the rapid exploration of new physical capabilities, providing greatly improved performance portability, and working toward full interoperability between classes of components. More specifically, ADAPT implements fundamental software techniques that can be used in many-core systems to efficiently execute MPI-based applications and to tolerate fail-stop process failures, at scales ranging from current large systems to the extreme scale systems that are coming soon. To improve the efficiency of Open MPI, ADAPT integrates, as a core component, knowledge about the hardware architecture, and allows all layers of the software stack full access to this information. Process placement, distributed topologies, file accesses, point-to-point and collective communications can then adapt to such topological information, providing more portability. The ADAPT team is also updating the current collective communication layer to allow for a task-based collective description contained at a group-level, which in turn adjusts to the intra and inter-node topology. Planned expansion of the current code with resilient capabilities allows Open MPI to efficiently survive hard and soft error types of failures. These capabilities can be used as building blocks for all currently active fault tolerance proposals in the MPI standard body. MPI is already one of the most relevant parallel programming models, the most important brick of most parallel applications, and one of the most critical communication pieces of most other programing models. Thus, the experience of the research team and emerging capabilities can benefit all future users of these programming standards, tools, and libraries--regardless of discipline. Any improvement in the performance and capabilities of a major MPI library such as Open MPI, has tremendous potential for an immediate and dramatic impact on the application communities. In addition to improving the time to solution for their applications, it has the potential to decrease the energy usage and maximize the performance delivered by the existing execution platforms. The scale at which the Open MPI library is used in government research institutions (including universities and national laboratories), as well as in the private sector, is a major vector for a quick impact on all scientific and engineering communities.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1339820
Program Officer
Rajiv Ramnath
Project Start
Project End
Budget Start
2013-09-01
Budget End
2016-08-31
Support Year
Fiscal Year
2013
Total Cost
$347,216
Indirect Cost
Name
University of Tennessee Knoxville
Department
Type
DUNS #
City
Knoxville
State
TN
Country
United States
Zip Code
37916