Microprocessors are facing an energy wall in scaling performance which requires a major deviation from the scaling technologies of the past 25 years (microarchitecture innovation and caches). In the Exascale computing time window (2018+), even with parallelism, advanced circuit techniques such as near-threshold voltage, and simplified microarchitectures, energy will be the key constraint, implying that architectures (and software) which make most efficient use of each transistor switching to complete application work are needed.

Many researchers have published techniques which exploit heterogeneity or customization which have the potential to reduce energy and increase performance, often by 10x or more. However, few of these techniques have made it into large-scale deployment because of the 90/10 optimization model, which discriminates against innovations which benefit only a portion of the average workload. To date there has been no systematic framework for thinking about how to introduce such heterogeneity and no disciplined method for analyzing and optimizing for workloads which shared like properties. We propose exploratory development of 10x10, a transformative model which enables both broad exploitation of customization for higher energy efficiency and performance, by looking at 10 clusters of application computational structure, enabling specialization for each.

To meet these needs, we will develop a 10x10 model, a disciplined framework for analyzing workloads, dividing them in separate clusters for energy and performance optimization ? and disciplined introduction of heterogeneity/customization ? which yields understandable and predictable benefits. We introduce the notion of 10x10 architecture, which exploits the 10x10 framework to drive the design of energy efficient and higher performance microprocessors in a technology scaling regime which yields plentiful transistors, but modest energy scaling.

Because these 10x10 architectures may make better use of their transistors than regular replication of traditional cores for parallelism, they can be expected to outperform them in low parallelism phases (?sequential?). In addition, because of their greater energy efficiency, they should also parallel outperform parallel systems based on the replication of traditional cores in phases with high parallelism. In short, if 10x10 research is successful, it will transform how we think about application workload analysis, computer architecture and implementation, and software compiler tools.

If successful, these efforts will transform the thinking of the computing research community and the industry. The potential is to break out of a ?local minima? proscribed by the power wall and the end of Moore?s Law to enable scientific breakthroughs facilitated by exascale simulation of physical and mathematical processes. This work will tie into the education and experience of new computer scientists, dealing as it does with a novel and potentially transformative architecture idea. Improvements in energy efficiency on heterogeneous systems will be disseminated to new NSF systems such as the Track 2D experimental system at Oak Ridge.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1057921
Program Officer
Robert Chadduck
Project Start
Project End
Budget Start
2010-09-01
Budget End
2012-04-30
Support Year
Fiscal Year
2010
Total Cost
$300,000
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093