Frequent doubling of computer system performance has facilitated innovations in science, education, government, and commerce. The foundations of these improvements, Moore's Law improving density and Dennard scaling reducing transistor power, have enabled chips with exponentially more transistors at roughly fixed power and cost. However, upcoming physical limits threaten to force a choice between either chip power and cost escalation or stagnant chip performance where lowest cost wins.
This project seeks a new middle approach, called dark silicon, that keeps the number of powered-on transistors roughly constant even as the number of transistors per chip grows. Rather than add general-purpose cores, future mainstream chips will deploy many accelerators to improve performance or power by 10x-100x. Such chips will turn on one or more accelerators when needed to help power and/or performance and leave most others off. Accelerators for system-on-chip (SoC) have already been designed, for such uses as encryption, (de)compression, network protocols, XML, and graphics. This research seeks to invent and refine architecture and system support to make existing and future accelerators possible in mainstream processors. Specific focus is given to low-overhead solutions facilitating fine-grain use that allow accelerators to be used and shared while protecting security and privacy. The project relies on co-design of hardware and software interfaces to accelerators in order to enable direct, low-latency access from user-mode code via coherent shared-memory communication.
More efficient access to accelerators enables Moore's law to bring continued performance increases without corresponding increases in power. This can reduce the overall power consumption of computing and reduce greenhouse gas production, as well as provide greater computation power at any scale, whether in a mobile device or a supercomputer. The PIs continue to impact broadly the state-of-the-art of computer systems through students, courses, talks, industrial affiliates, commercial influence, and sharing of infrastructure.
Over the past four decades, computer performance has doubled approximately every 18 months, transforming science, education, government, and commerce. However, the foundation of these improvements, Moore's Law with Denard scaling, seems likely to end within the next decade, forcing a new approach to computer design. In one possible approach, a computer chip will consist of a heterogeneous collection of specialized computing units. Just like a Swiss Army pocket knife with a knife blade, scissors, corkscrew, and fingernail file, such a future computer may only be able to use one specialized unit at a time, but with much greater performance and efficiency for a given task. This project consisted of three major thrusts. First, we developed a model to help decide whether including a given specialized compute unit would actually improve performance; this is analogous to understanding how much some users might benefit from including a magnifying glass in a Swiss Army knife. Second, we developed memory management techniques that allow specialized computing units to access a conventional computer memory hierarchy; this is loosely analogous to ensuring that the different "blades" of a Swiss Army knife open and close in a conventional way. And third, we developed policies for determining which specialized compute units, or "blades," would be most appropriate for a given task. Specific outcomes include models and mechanisms for designing, interfacing to the cache and virtual memory systems, and efficiently using specialized compute units in a heterogeneous computer system. The project also contributed to the development of the gem5-gpu simulation package (https://gem5-gpu.cs.wisc.edu/wiki/) and the BadgerTrap tracing tool (http://research.cs.wisc.edu/multifacet/BadgerTrap/), which are now available as open-source software.