As computing devices solve increasingly complex and diverse problems, engineers seek to design processors that provide higher performance, while remaining energy-efficient for environmental reasons. To achieve this, processor vendors have embraced manycore devices, where thousands of cores cooperate on a single chip to solve large-scale problems in a parallel manner. They have further incorporated heterogeneity, combining cores with different architectures on a single chip in a bid to provide ever-increasing performance per watt. This project boosts the search for higher energy-efficient performance by inventing novel cross-core learning techniques. Cores in current chips individually learn about the behavior of parallel programs in order to run programs more efficiently in the future, devoting complex and power-hungry hardware structures to do this. However, this research observes that parallel programs tend to exercise the hardware structures of different cores in correlated ways, meaning that the behavior of the program run on one core can be communicated to other cores for various performance and power benefits. As such, this form of intelligent cross-core information exchange is effective in achieving high performance per watt across computing domains from datacenters to embedded systems

In this light, this research provides techniques to deduce how similarly a parallel program's various threads exercise their cores' hardware structures (looking at a range of different programmer, compiler, and architectural mechanisms to do so). When this is detected, cross-core learning hardware gleans the information that is most useful to exchange to improve performance or power, and then transmits this information among heterogeneous cores using low-overhead hardware/software techniques. This project develops a lightweight runtime software layer to orchestrate this information exchange, relying on dedicated hardware support when necessary. Through developing this framework, cross-core learning is applied to a number of specific cases, ranging from higher-performance manycore cache prefetching and branch prediction, to performance and power-management techniques for interrupts and exceptions in scale-out systems, as well as thread and instruction scheduling. Furthermore, this project heavily disseminates knowledge on how to design and program large-scale manycore systems (or scale-out systems) by involving students at the graduate, undergraduate, and high-school levels through active research and coursework. Overall, this work impacts the engineering community and broader society by: (1) helping to achieve high-performance, but also energy-efficient and environmentally-friendly computing systems; (2) providing academics and chip designers a design methodology and infrastructure to study manycore design; (3) broadening the participation of underrepresented groups in computer science; (4) educating graduate, undergraduate, and high-school students on parallel programming for manycore systems.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1253700
Program Officer
Anindya Banerjee
Project Start
Project End
Budget Start
2013-07-01
Budget End
2019-03-31
Support Year
Fiscal Year
2012
Total Cost
$520,000
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
Piscataway
State
NJ
Country
United States
Zip Code
08854