This project will develop a job scheduling and resource allocation system for data-intensive high-performance computing (HPC) based on the congestion pricing of a systems' heterogeneous resources. This extends the concept of resource management beyond processing: it allocates memory, disk I/O, and the network among jobs. The research will overcome the critical shortcomings of processor-centric resource management, which wastes huge portions of cluster and supercomputer resources for data-intensive workloads, e.g. I/O bandwidth governs the performance of many modern HPC applications but, at present, it is neither allocated nor managed. The research will develop techniques that (1) reconfigure the degree of parallelism of HPC jobs to avoid congestion and wastage, (2) support lower-priority, allocation elastic jobs that can be scheduled on arbitrary numbers of nodes to consume unallocated resource fragments, and (3) co-schedule batch-processing workloads that use system resources that are unoccupied due to asymmetric utilization and temporal shifts in the foreground jobs. These techniques will be implemented and supported for free public use as extensions to an open-source resource-management framework. If used broadly, the software has the potential to provide much better utilization of the national investment in HPC facilities.

Project Report

The goal of this project was to create techniques for computing with big data that fully utilize the capabilities of modern hardware. Prior techniques for allocating resources are processor-centric; they distribute compute cycles to parallel jobs and do not account for memory and disk bottlenecks. We developed a suite of job scheduling and storage management tools that are data-centric and provide huge performance gains for big data computing. High IOPS Storage Systems: We built high IOPS (I/O operations per second) storage systems that overcome the write bottlenecks for random workloads. This scales single-system I/O to the extreme, building engines that fully utilize the capabilities of shared-memory hardware, specifically massive non-uniform memory architectures and arrays of solid-state storage devices (SSDs). The process overcame obstacles to the scalability of systems, such as remote memory performance, processor/device affinities, and operating system resource contention to realize more than 1 million IOPS. Data-Driven Batch Scheduling: We built a data-driven scheduling framework for high-performance computing (HPC) applications that have overlapping data requirements. This embodies two principles: to schedule the execution of workload in the order that produces the most efficient I/O schedule and to identify shared I/O among different jobs and perform the I/O one time to meet the requirements of all jobs. This framework turns scheduling upside down: the HPC tradition schedules execution order and derives I/O requests from the execution order. Our techniques schedule I/O and derive a processing order from the preferred I/O schedule. Using data-driven scheduling, we compute queries to the Johns Hopkins Turbulence Database (http://turbulence.pha.jhu.edu) at the aggregate streaming I/O rate of disk array, improving performance by a factor of two to eight. Classroom Education: This grant also funded the development of two new undergraduate and graduate computer science courses that focus on big data. Parallel programming has taught more than 300 student to abandon the comfort of serial algorithmic thinking and to harness the power of superomputers, clouds, GPUs, and multi-core processors. Data-intensive computing is an experiential education course that uses 10 hours of classroom contact and team programming to build data systems and algorithms on the Amazon cloud.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0937810
Program Officer
Almadena Y. Chtchelkanova
Project Start
Project End
Budget Start
2009-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$495,000
Indirect Cost
Name
Johns Hopkins University
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21218