Scalability is one of the key challenges to computing with hundreds if not thousands of processor. Yet, testing software at scale with hundreds of processing cores is impossible if system software with privileged access rights needs to be modified. The inability to change system software at will in large-scale computing installations thus impedes progress in system software.

This project creates a mid-size computational infrastructure, called ARC (A Root Cluster), that directly supports research into scalability for system-level software solutions. ARC empowers users temporarily with administrator (root) rights and allows them to replace arbitrary components of the software stack. Such replacements range from entire operating systems over drivers, kernel modules to runtime libraries, middleware and system tools.

ARC ultimately enables a multitude of systems research directions to be assessed under scalability that could otherwise not be conducted. Through ARC, methodologies for scalability of experimental system software in various institutional projects and beyond can be explored and systematically improved. ARC is positioned to benefit the software systems community and indirectly science in general by this assessment of system software requirements at scale.

Project Report

This project created a mid-size computational infrastructure, called ARC (A Root Cluster), that directly supports research into scalability for system-level software solutions. ARC empowers users temporarily with administrator (root) rights and allows them to replace arbitrary components of the software stack. ARC enables a multitude of research directions to be assessed under scalability. During the project period, the ARC infrastructure was the planned, specified, a bid was issued, the equipment was purchased, and hardware/software were installed. The cluster was in experimental mode from November 2010 and entered production mode in January 2011. Extensive hardware/software enhancements continued during the remainder of the project period, including specialized thermal balancing software. Educational activities included coverage of parallel programming on clusters and data parallelism on GPUs in graduate and undergraduate classes of the Computer Science curriculum with resources (slides, programming examples) made publicly available on the web. An educational highlight was a short course on ``Introduction to Parallel Programming with Single and Multiple GPUs'', taught by the PI, to allow scientists and non-scientists of all disciplines to acquaint themselves with GPU programming. The online material is periodically updated to include novel features (e.g., OpenACC support). Research conducted on the ARC infrastructure resulted in numerous publications that have since shaped the landscape of parallel computing contributing to future advances toward the path to exascale HPC computing. This research has helped in assessing the scalability of experimental system software components. It has contributed to a better understanding of infrastructure needs for future system-level research on large-scale many-core and many-node computational facilities. Individual projects exploiting ARC have advanced the knowledge in scalable software design for high-performance computing, fault tolerance, I/O, cloud computing and security. The ARC cluster remains invaluable to the PIs and many users (across numerous department at and beyond NC State University) in conducting cutting-edge research. It has also fostered new research projects with inter-disciplinary participation.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
0958311
Program Officer
Krishna Kant
Project Start
Project End
Budget Start
2010-03-01
Budget End
2013-02-28
Support Year
Fiscal Year
2009
Total Cost
$549,999
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695