A current industry trend aimed at addressing platform performance/power requirements is to create heterogeneous manycore systems, comprised of general purpose and specialized cores designed to accelerate certain application or system functions. A second trend, designed to make it easier to map a wide variety of functions and components to manycore platforms, is platform-level support for system virtualization. This research innovates, implements, and evaluates new virtualization technologies for heterogeneous manycore architectures composed of commodity general-purpose and accelerator cores. The goal is to realize an efficient execution environment for composing and executing a range of computationally and data-intensive applications.
The system abstractions innovated include (i) the HVM (heterogeneous virtual machine) platform abstraction for dynamic composition of resources (e.g., cores, accelerators, memory, I/O) (ii) new methods for managing heterogeneous manycore resources, including power, and (iii) specialized execution environments for optimizing accelerator interactions. These components are implicitly integrated through an execution model wherein the same abstractions and mechanisms are used to dynamically manage diverse accelerator platforms, thereby realizing our vision of freely shared and customized platform resources provided to applications.
This project addressed challenges stemming from fundamental trends re-shaping the high performance and enterprise computing landscape. First, with multicore architectures, performance scaling is being achieved both with the increased replication of general purpose processing cores and by small footprint, low power, customized accelerators like graphics processors (GPUs). This has led to the emergence of heterogeneous many-core platforms -- systems comprised of general purpose intermingled with customized cores, jointly using diverse memory and cache hierarchies, the latter both on chip as well as in rack scale and multi-rack scale systems. Second, with virtualization technologies, new opportunities are created for exploiting the many cores present in current and future platforms, via their shared use by consolidated workloads. The combination of heterogeneous many-core architectures and ubiquitous resource virtualization has had a disruptive impact on the systems software required for future many-core systems. In response to these trends, this research contributed new virtualization technologies for heterogeneous many-core architectures. The goal was to realize an efficient execution environment for computationally and/or data-intensive applications. Toward this end, the project defined and evaluated a novel execution model for applications running on heterogeneous machines. The model defines the computational objects and associated metadata that are produced by the compilation environment and based on these descriptions, determines how these objects are executed by the underlying hardware. The model’s execution environment is based on the virtualization of accelerator and general-purpose hardware resources, to create heterogeneous virtual machines (HVMs). HVMs serve as the principal abstraction for the dynamic and customized composition of core, accelerator, memory and I/O resources, and they are mapped and scheduled onto physical resources by the execution environment. The execution environment, therefore, presents virtual platforms -- HVMs – to applications, where HVMs can be created and composed dynamically, their mappings to resources controlled and managed by platform virtualization technologies, thereby realizing our vision of freely shared and customized platform resources provided to applications. Specific outcomes of this project included: - delivering the HVM infrastructure in the form of a runtime and extensions to the open source virtual machine monitor (Xen) targeted to commodity processors (x86) and accelerators (e.g., NVIDIA GPUs) and receptive to modern compilation toolchains; - development of resource monitoring and management methods that efficiently deal with various degrees of resource asymmetry and heterogeneity, and achieve high levels of application performance joint with high levels of resource utilization and/or desired application/VM-level SLAs; - a flexible runtime system, programming model, and accompanying tools permitting instrumentation, profiling, and/or translation of application execution contexts from one type of platform resources (e.g., GPU), to another (e.g., CPU), so as to deal with potential mismatches between the types of resources required by the HVM application vs. the available underlying hardware; - software infrastructure to permit disaggregated heterogeneous resources to be consolidated into a unified platform, capable of execution of codes with requirements for compute, accelerators, or memory which exceed the limitations posed by single machines or single coherence domains; - mechanisms to deal with memory heterogeneity beyond NUMAness, i.e., beyond memory differentiated as ``slower/closer'' vs. ``faster/farther'', to also consider the presence of persistent memory -- storage class memory (SCM) like NVRAM -- as envisioned for future exascale systems; and - characterization of application behavior to drive the efficient use of the underlying heterogeneous machine resources on which applications run. This award helped develop important software infrastructure and tools regarding the use of present and future multi-/many-core platforms, including those being envisioned for future exascale designs, for both enterprise/datacenter settings, such as with current virtualized cloud platforms, as well as for future high-end computing installations. The impact of the technologies developed under this award is evident from extensive technical collaborations with US and international industry and academic institutions, centered around topics originally investigated under this award, as well as the community of users of the software artifacts resulting from this funding. In addition to resulting in numerous publications and invited talks, several software artifacts developed under this award are publicly available through google.code. For select software components, Ocelot in particular, several tutorials were organized and delivered at premier architecture and compiler/runtime conferences (e.g., HPCA, PACT, IIWCS, and others). The Ocelot infrastructure has had approximately 10,000 downloads to date, and is used in projects at several universities around the world, as well as usage in several companies including NVIDIA and AMD. Finally, the project supported fully or partially over 20 students over its duration, with additional students being involved with it indirectly (through class and special projects). These students were exposed to major technical advances in the program, acquired advanced skills, and developed the technical maturity important to the industries that hired them.