Modern high-end computer (HEC) architectures (comprised of tens-of-thousands of processors or more) have widened the performance gap between the computing power and the storage and I/O performance that can support and sustain such calculations. With the petascale computing (1015 floating point operations per second) just over the horizon, the gap will be even more pronounced. This research is aimed at addressing both the limited bandwidth and the performance of today's I/O storage systems. The main idea is to replace traditional rotary disks with a 'memory pool' - a massive grid of solid-state (flash) memories, and to introduce a new memory hierarchy model based on the improved access time and bandwidth of such grid. A memory pool embedded and distributed across the compute engine leverages the rapid progression of solid-state storage technology.
The I/O-aware multithreaded program execution model takes full advantage of the tremendous bandwidth to the local memory pool, while tolerating the solid state storage access latency through proactive distributed pre-fetching and software-controlled memory caching. In particular, the responses to different patterns of solid-state storage communication requests (blocking and non-blocking, and of various block sizes), simulated solid-state storage module failures, extended shared memory responsiveness and management are examined. The effectiveness of the approach is observed through the most common set of third party I/O benchmarks.
The intellectual merit of this research can be summarized as: 1) Development of a novel I/O architecture model. for a class of high-end petascale computing systems. 2) Development of a corresponding RAS model 3) An experimental study on the new I/O architecture and software model (in (1) and (2) above).