The High End Computing (HEC) field is at a major crossroads due to the advent of many-core technology --- integration of tens and hundred processors (cores) on a single chip, heading to 1000 cores per chip in Exascale systems in the 2015-2020 timeframe. Unlike previous generations of hardware evolution, this shift in the hardware road-map will have a profound impact on future HEC software. First, the programmer will be faced with the scalability challenge of expressing and managing parallelism at the scale of millions to one billion threads in a single system. Second, the programmer will be faced with the locality challenge of optimizing data movement in a highly non-uniform system structure with order-of-magnitude gaps in bandwidth and latency between adjacent levels of the memory hierarchy.
The specific focus of this project is on developing programming models, compilers, and runtimes to address the above challenges of future HEC systems, guided by a specific many-core architecture to ensure practicality. The research will deliver results in the following areas: 1) new parallel programming constructs for many-core architectures such as asynchronous activities, places, and phasers that build on past experiences with the X10 language, but will be manifested in C/C++ instead of Java in this research; 2) new parallel intermediate representations (PIR?s) and compiler optimizations for parallelism and locality; 3) a new thread virtual machine with two levels of parallelism, Synchronous Coarse-Grain Threads (SCTs) and Asynchronous Fine-Grain Threads (AFTs); and 4) evaluation of the programming model, compiler, and runtime on a Cyclops C64 many-core system that directly exemplifies the parallelism and locality challenges facing future HEC systems.