Petascale applications are producing terabytes of data at a great rate. Storage systems in large-scale machines are significantly stressed as I/O rates are not growing as fast to cope with data production. A variety of HPC activities such as writing output and checkpoint data are all stymied by the I/O bandwidth bottleneck. Further to this, the post-processing and subsequent analysis/visualization of computational results is increasingly time consuming due to the widening gap between the storage/processing capacities of supercomputers and users' local clusters.

This research focuses on building a novel in-job dynamic data staging architecture and in bringing it to bear on the looming petascale I/O crisis. To this end, the following objectives are investigated: (i) the concerted use of node-local memory and emerging hardware such as Solid State Disks (SSDs), from a dedicated set of nodes, as a means to alleviate the I/O bandwidth bottleneck, (ii) the multiplexing of traditional user post-processing pipelines and secondary computations with asynchronous I/O on the staging ground to perform scalable I/O and data analytics, (iii) bypassing memory to access the staging area, and (iv) enabling QoS both in the staging ground and in the communication channel connecting it to compute client and persistent storage.

This study will have a wide-ranging impact on future provisioning of extreme-scale machines and will provide formative guidelines to this end. The result of this research will be a set of integrated techniques that can fundamentally change the current parallel I/O model and accelerate petascale I/O pipelines. Further, this research will help analyze the utility of SSDs in day-to-day supercomputing I/O and inform the wider HPC community of its viability.

Project Report

The overall goals of this research are to design and build a novel `in-job data staging architecture' from a dedicated set of nodes using a combination of node-local memory and emerging hardware such as Solid State Disks (SSDs). This staging ground will be used to perform scalable I/O and data analytics while a much larger number of nodes are running the parallel application. The OSU part of this collaborative research focuses on the following: 1) Designing appropriate RDMA-based mechanisms to the staging area consisting of SSDs, 2) Novel approaches (user/kernel/hybrid) schemes with memory-bypass to access SSDs and 3) providing InfiniBand-level QoS-aware accesses to the staging area. Research has been done in all these three areas. RDMA-based designs for fast checkpointing and migration schemes have been designed with SSDs. The designs have also been combined with InfiniBand QoS framework to provide performance isolation across communication and I/O traffic. Hybrid schemes have been proposed to access SSDs with InfiniBand and RDMA. New designs for file systems have been proposed for supporting SCR (a popular chekpointing library) to support applications-level checkpointing. The new scheme has been demonstrated to scale up to 3 million MPI tasks. The checkpointing and migration schemes with RDMA and QoS support have been integrated into the popular open-source MVAPICH2 MPI library. The research results have been presented at internationational conferences and workshops. Results have also been disseminated through Keynote Talks, Invited Tutorials and Invited talks, presented by the PI. The designs have been integrated and made available with MVAPICH2 software library which is being used by more than 2,000 organizations in 70 countries. Many production systems, including XSEDE systems, use this software library to extract performance and scalbility from modern InfiniBand clusters. The proposed designs have also influenced other MPI libraries to have similar features and support.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0937842
Program Officer
Almadena Y. Chtchelkanova
Project Start
Project End
Budget Start
2010-05-01
Budget End
2013-04-30
Support Year
Fiscal Year
2009
Total Cost
$90,000
Indirect Cost
Name
Ohio State University
Department
Type
DUNS #
City
Columbus
State
OH
Country
United States
Zip Code
43210