Nikolopoulos, Dimitrios College of William and Mary CCF-0346867
This research activity involves the design, development and deployment of a programming framework for explicit multilevel parallelization using a global address space model. The framework targets the upcoming generation of Petaflop-class supercomputers, which are based on architectural substrates with multiple levels of on-chip and off-chip parallelism and deep memory hierarchies. The research addresses the need for increased productivity and utilization of the National supercomputing power, by attempting to close the gap between computer architecture innovation and parallel programming practices. The main goal of this activity is to reduce the programming effort required for mapping algorithmic parallelism to hierarchical hardware components with heterogeneous means for parallel execution and assist programmers in deriving balanced designs of layered parallel applications. The programming framework investigated in this research unifies parallel programming models and methodologies and enables faster adaptation of parallel code to new hardware platforms. Concurrently, it forms a basis for education and training of interdisciplinary student audiences in high performance programming.
The parallel programming component of this research is designed around standard C++ templates with notation for nested threads and iterators. The notations for parallelism are coupled with a templated representation of data, which allows for arbitrary partitioning, sharing, and coherence control at multiple levels of parallel execution constructs. While the programmer highlights nested parallelism, the orchestration and management of multigrain threads and data are delegated to the compiler and the runtime system. The research investigates novel methods for controlling the granularity of multilevel parallelism via vertical analysis of the program. Periodicity analysis and selective runtime tracing are used as means to derive effective data distribution and layout schemes without user intervention. Alongside runtime analysis, new resource-driven scheduling strategies and novel microprocessor features, including on-chip multithreading, on-chip SIMD parallelism and speculative execution, are incorporated into the parallelization and program optimization processes.