Microprocessors are facing an energy wall in scaling performance which requires a major deviation from the scaling technologies of the past 25 years (microarchitecture innovation and caches). In the Exascale computing time window (2018+), even with parallelism, advanced circuit techniques such as near-threshold voltage, and simplified microarchitectures, energy will be the key constraint, implying that architectures (and software) which make most efficient use of each transistor switching to complete application work are needed.
Many researchers have published techniques which exploit heterogeneity or customization which have the potential to reduce energy and increase performance, often by 10x or more. However, few of these techniques have made it into large-scale deployment because of the 90/10 optimization model, which discriminates against innovations which benefit only a portion of the average workload. To date there has been no systematic framework for thinking about how to introduce such heterogeneity and no disciplined method for analyzing and optimizing for workloads which shared like properties. We propose exploratory development of 10x10, a transformative model which enables both broad exploitation of customization for higher energy efficiency and performance, by looking at 10 clusters of application computational structure, enabling specialization for each.
To meet these needs, we will develop a 10x10 model, a disciplined framework for analyzing workloads, dividing them in separate clusters for energy and performance optimization ? and disciplined introduction of heterogeneity/customization ? which yields understandable and predictable benefits. We introduce the notion of 10x10 architecture, which exploits the 10x10 framework to drive the design of energy efficient and higher performance microprocessors in a technology scaling regime which yields plentiful transistors, but modest energy scaling.
Because these 10x10 architectures may make better use of their transistors than regular replication of traditional cores for parallelism, they can be expected to outperform them in low parallelism phases (?sequential?). In addition, because of their greater energy efficiency, they should also parallel outperform parallel systems based on the replication of traditional cores in phases with high parallelism. In short, if 10x10 research is successful, it will transform how we think about application workload analysis, computer architecture and implementation, and software compiler tools.
If successful, these efforts will transform the thinking of the computing research community and the industry. The potential is to break out of a ?local minima? proscribed by the power wall and the end of Moore?s Law to enable scientific breakthroughs facilitated by exascale simulation of physical and mathematical processes. This work will tie into the education and experience of new computer scientists, dealing as it does with a novel and potentially transformative architecture idea. Improvements in energy efficiency on heterogeneous systems will be disseminated to new NSF systems such as the Track 2D experimental system at Oak Ridge.