Maintaining high parallelism is critical for efficient use of multicore- and cluster-based parallel systems, but this can be at odds with I/O efficiency, resulting in suboptimal performance of I/O-intensive parallel applications. The efficiency of request-processing policies at different levels of the I/O stack relies on the locality of requests from various processes. Processes are the producers of I/O requests, and their scheduling determines the timing of request issuance and the locality among requests from different processes. When a program is I/O bottlenecked the scheduling of its processes directly affects the storage system?s efficiency, and thus the program?s execution time. In such scenarios, a scheduling policy designed to improve request locality, in preference to the usual objectives such as load balance and fairness, is expected to improve the overall efficiency of the I/O stack and ameliorate I/O bottlenecks.
The investigator proposes a dual-mode execution, incorporating a new data-driven mode to complement the normal statement-driven mode. In the data-driven mode processes are scheduled such that they will issue requests with improved locality, and will consume data that has been efficiently pre-fetched. The research focus of this 12-month project is the investigation and understanding of the extent to which process scheduling can positively affect I/O performance. It takes into account variance of I/O intensity, I/O access pattern of individual processes, and the ratio of reads and writes. The investigator will perform an extensive examination of I/O performance behaviors at different I/O layers, including the I/O library, system buffer, and I/O scheduler, with various hypothetical I/O-aware process scheduling strategies. This research will reveal the potential merits of the proposed dual-mode execution, and delineate the design space for algorithms supporting it. It will also identify pitfalls and limits of potential designs and their implementations. By so doing, this project is expected to pave the road to the introduction of a disruptive technique for mitigating I/O bottlenecks.
This project aims to reveal potential merits of a computer system design, in which process scheduling and I/O request scheduling are conducted cooperatively. In conventional systems, these two kinds of scheduling strategies are designed and executed independently. By monitoring whether CPU or I/O devices are the performance bottleneck, the system can adaptively trade the flexibility of one kind of strategy for larger room of performance improvement for the other one, or use the proposed dual-mode execution. While this proposed solution has the potential to significantly accelerate processing of I/O-intensive programs, such as those using many multi-core computers to analyze big data, it could introduce additional management overhead that may offset its performance advantage. Accordingly this project aims to identify pitfalls and limits of the proposed design. After two-year of extensive research on the issues, the PI and his team obtained three major outcomes demonstrating intellectual merits of the project. First, they found that with increasingly intensive I/O activities in a program’s execution, it’s more difficult to rely on conventional approaches to hide I/O times or improve I/O spatial locality for higher I/O performance. The proposed dual-mode execution can fundamentally address the issue. Second, there are three factors, namely, process count, access locality, and I/O intensity, that are most relevant in determining the right execution mode (data-driven mode or statement-driven mode) and making the dual-mode execution cost-effective. Third, widespread use of solid-state disk (SSD) introduces a new dimension in the study, which is how the layout of data on the SSD and hard disk affects I/O performance. The broader impacts of the project are demonstrated by the prototyped systems, publications, as well as human resource training produced during execution of the project. In particular, the PI and his team prototyped two systems. One is named as iHarmonizer, that automatically parallelizes an openMP program and guides parallel execution to adaptively schedule I/O according to shifting performance bottleneck. The other is named as DualPar, that regulates an MPI program’s execution by predicting future I/O patterns and accordingly adjusting process scheduling. Both systems have seen substantial performance advantages in their respective evaluations. There are four papers published on the IEEE international parallel and distributed processing symposium, a major conference dedicated to the area of parallel and distributed computing, to disseminate the results. At Wayne State University one Ph.D student and one Master student used this project as the major parts of their thesis work. Both had successfully defended their theses and graduated. Two undergraduate students under-represented in today's CSE discipline (both are female students and one is also an African-American) participated in the project. They learned skills for selecting and running benchmarks as well as collecting and analyzing measurements. A new topic graduate-level topic course "ECE7995: High-performance I/O Service for Data-intensive Computing" was created. Some of this project’s research findings are included in the teaching.