The AfterBurner project looks at improving single-thread performance on both simple and high-performance out-of-order cores in an energy efficient way. Aside from explicit parallelism, this is the primary challenge of multi-core architectures going forward. The most energy-efficient way to improve single-thread performance is to accelerate low-performing program regions. This approach yields the greatest benefit. It also has a low cost because it doesnot require high-bandwidth execution, making it applicable to both simple and high-performance cores. Low single-thread performance is caused by squashes due to control and data mis-speculations and by long latency loads and stores which clog the pipeline. AfterBurner unifies two recently proposed techniques---speculative retirement which can efficiently buffer large numbers of completed instructions and selective re-execution which can re-execute dynamically generated program subgraphs to back-patch program state---and uses them to tolerate all four classes of low-performance events. AfterBurner's multi-purpose infrastructure approach to performance reduces cost, simplifies design, and expands applicability to code that suffers from different low-performance events simulatenously.

In addition to education and student tarining, the AfterBurner project marks the beginning of a systems research collaboration between Uniersity of Pennsylvania and Drexel computer science departments.

Project Report

FINAL OUTCOMES REPORT FOR AWARD # CCF-1017654 AFTERBURNER: EFFICIENT PERFORMANCE SCALING VIA POST-RETIREMENT PROCESSING REPORT PERIOD – START DATE: 09-01-2010 — END DATE: 12-31-2011 PI: MARK HEMPSTEAD, DREXEL UNIVERSITY Power consumption is the main concern facing designers of modern computing systems as transistor power density is increasing with each new process technology generation. Power consumption affects multiple layers of the design space, from high-performance microarchitecture, to system architecture and operating systems. Through the combination of circuit simulations and measurements of actual hardware we have developed power models; we are using these models to evaluate new microarchitecture and system design ideas. Our research progress over the 15 months of the project in these three areas have resulted in three published papers, including a best paper nominee at a high-impact conference for high performance computer architecture, HPCA. The goal of the AfterBurner project is to study new microarchitecture techniques to improve the performance of regions of code that exhibit small amounts of instruction-level parallelism (low ILP regions). We target low-ILP regions because they are a key factor limiting improvements in single-thread performance. These new microarchitecture techniques must be evaluated in terms of both performance and energy efficiency. The AfterBurner project is a collaboration between Prof. Mark Hempstead, at Drexel University, and Prof. Amir Roth at the University of Pennsylvania. The Drexel portion of the project, which was conducted during the first year of the three year project, focused on power modeling efforts at both the system and microarchitecture levels. The research accomplishments of this project span three different areas from high performance computer architecture, low power medical imaging, and energy efficient operating systems . Power Modeling of Reference Counts for High-Performance Microarchitectures - We evaluated reference counting, a new microarchitecture technique to mange register allocation. We have developed custom RTL and HSPICE circuit models of the reference count structure. Our results show that reference counting can be used to reduce energy consumption and enable new mechanisms, including: fine-grained power-gating, cheap checkpointing, and move elimination. This work will be published at HPCA 2012 and has been nominated for best paper Evaluation of Low Power Accelerator Architectures for Ultrasound Imaging - We study the trend of using specialized energy-efficient hardware accelerators to increase overall system performance, through an algorithm used to de-speckle Ultrasound images. We developed a customized circuit for this application and compared this implementation to both CPU and GPU (graphics processing unit) implementations of the algorithm. This work was published in a top conference for embedded system architecture, CASES 2011. Energy-proportional Architectures and Operating Systems - Researchers have observed that the energy consumption of a system is often not proportional to a machine's computation load. We have proposed a solution to the problem by managing a system of heterogeneous components through the operating system to select the most energy-efficient ensemble for each application. An overview of this approach was presented at HotOS'11.

Project Start
Project End
Budget Start
2010-09-01
Budget End
2011-12-31
Support Year
Fiscal Year
2010
Total Cost
$118,999
Indirect Cost
Name
Drexel University
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19102