Microprocessors are facing an energy wall in scaling performance which requires a major deviation from the scaling technologies of the past 25 years (microarchitecture innovation and caches). In the Exascale computing time window (2018+), even with parallelism, advanced circuit techniques such as near-threshold voltage, and simplified microarchitectures, energy will be the key constraint, implying that architectures (and software) which make most efficient use of each transistor switching to complete application work are needed.
Many researchers have published techniques which exploit heterogeneity or customization which have the potential to reduce energy and increase performance, often by 10x or more. However, few of these techniques have made it into large-scale deployment because of the 90/10 optimization model, which discriminates against innovations which benefit only a portion of the average workload. To date there has been no systematic framework for thinking about how to introduce such heterogeneity and no disciplined method for analyzing and optimizing for workloads which shared like properties. We propose exploratory development of 10x10, a transformative model which enables both broad exploitation of customization for higher energy efficiency and performance, by looking at 10 clusters of application computational structure, enabling specialization for each.
To meet these needs, we will develop a 10x10 model, a disciplined framework for analyzing workloads, dividing them in separate clusters for energy and performance optimization ? and disciplined introduction of heterogeneity/customization ? which yields understandable and predictable benefits. We introduce the notion of 10x10 architecture, which exploits the 10x10 framework to drive the design of energy efficient and higher performance microprocessors in a technology scaling regime which yields plentiful transistors, but modest energy scaling.
Because these 10x10 architectures may make better use of their transistors than regular replication of traditional cores for parallelism, they can be expected to outperform them in low parallelism phases (?sequential?). In addition, because of their greater energy efficiency, they should also parallel outperform parallel systems based on the replication of traditional cores in phases with high parallelism. In short, if 10x10 research is successful, it will transform how we think about application workload analysis, computer architecture and implementation, and software compiler tools.
If successful, these efforts will transform the thinking of the computing research community and the industry. The potential is to break out of a ?local minima? proscribed by the power wall and the end of Moore?s Law to enable scientific breakthroughs facilitated by exascale simulation of physical and mathematical processes. This work will tie into the education and experience of new computer scientists, dealing as it does with a novel and potentially transformative architecture idea. Improvements in energy efficiency on heterogeneous systems will be disseminated to new NSF systems such as the Track 2D experimental system at Oak Ridge.
The 10x10 Project was a highly-successful NSF project, creating significant new knowledge that will help to accelerate the progress of computing technology. This progress benefits not only our fundamental understanding of computing, but delivers significant societal benefits thru the every-expanding range of computing applications, internet services, mobile applications, and deep corporate computing (for engineering, stock trading, weather forecasting, etc.). Indeed, these are exciting times for computing. Specifically, the 10x10 project produced new insights that enable a new approach to computer architecture, not widely accepted by the current computer industry. - computer systems up to 20 times faster than those of today - computer systems as much as 20 times more energy efficient than those of today and the 10x10 project enhanced the training and education of the entire team, including 6 undergraduate students, 2 graduate students, and 3 senior technical staff. The team produced excellent results, and published these results in two journal publications, 6 excellent conference publications, and in two technical reports. Ambitious outreach, indeed! This team included participants at the University of Chicago and the University of California San Diego (UCSD). This training helps to maintain the US national competitiveness in computing workforce, and as a leader in computing. More information is available at http://10x10.cs.uchicago.edu/