This Small Business Innovation Research (SBIR) Phase I project addresses the challenge of designing ultra-low power circuits using new asynchronous design techniques to dramatically reduce both dynamic power and leakage power. The objectives of this project are to demonstrate improved power characteristics on an industrial multi-core processor using Sleep Convention Logic (SCL), a type of asynchronous logic that automatically puts circuits to sleep to reduce leakage power. A 16 core processor will be implemented using SCL and compared to a synchronous implementation, both in the same 65nm technology. Both implementations will be based upon the same source design described in an industry standard Hardware Description Language (HDL) and the resulting SCL implementation will be compared against the synchronous implementation to demonstrate functional equivalency. Both the SCL and synchronous implementations will be characterized for dynamic and leakage power consumption, area, and timing. It is anticipated that the SCL implementation will be somewhat larger in area and have somewhat slower timing, but have significantly reduced dynamic and leakage power. The characterization effort will quantify these comparisons. This project is significant in that this will be the first comparison of SCL and synchronous implementations of an industrial circuit in a nanometer fabrication process.

The broader impact/commercial potential of this project will be to establish asynchronous design methodologies based upon industry standards and to empirically quantify the benefits of using SCL for ultra-low power circuits and power sensitive applications. It is expected that SCL is especially well suited to system on chip (SoC) designs that employ a multitude of identical cores. Today, the number of cores used on SoCs is typically limited by total power consumption; if the power consumption of each core can be reduced dramatically, then the number of cores that can be placed on an SoC can be dramatically increased. Larger numbers of cores result in increased bandwidth and energy efficiency, thus enabling increased functionality especially in applications relying upon advanced signal processing such as high-speed wireless communications, remote sensing, embedded vision, and implantable medical devices. Hearing aids are but one example of applications requiring large amounts of signal processing at extremely low levels of power consumption. Today, limited battery life is a major issue inhibiting both market acceptance and personal convenience of advanced hearing aids. Results of this project will provide a path to easing such limitations thereby opening up new opportunities in personal communication and medical markets, among many others.

Project Report

A multi-core processor was designed using asynchronous circuit technology to improve energy efficiency. Several different asynchronous technologies were evaluated for use in a 16-core, synchronous processor for embedded vision processing. The results, utilizing a unique implementation of Globally Asynchronous Locally Synchronous (GALS) technology in combination with Dynamic Voltage and Frequency Scaling (DVFS), indicate a minimum 8.1% power consumption improvement across the entire 16-core array when executing a performance critical and computationally intensive application. Individual processor energy consumption was reduced as much as 26%. In this project, only the logic was voltage scaled. Larger energy savings are expected when the memory is also voltage scaled. In addtiion, less computationally intensive applications will show greater savings. The baseline design was a fully synchronous, power optimized, 16-core processor. All cores operated off the same clock and with the same supply voltage. Inter-core communication was synchronous and fixed to the global clock frequency. The workload varied from core to core as is common in multi-core designs with some processors finishing their workload early while others have no idle time. The challenge was to find a way to exploit this mixture of performance characteristics to reduce energy consumption. In a system with varying workloads, the conventional method for reducing energy consumption is to lower the power supply voltage to the minimum that meets the computation requirements. However, performance drops rapidly as the supply voltage is reduced. With reduced performance, the combinational logic might not be able to complete its tasks in time, leading to unacceptable logic faults. To avoid such faults, the clock frequency must be slowed prior to reducing the supply voltage. Therefore, the challenge is to find ways to divide a large, complex circuit into multiple blocks, and to custom tailor both the supply voltage and clock frequency for each block. Each block can then operate at just the required performance, and ideally can be adjusted for changing applications or conditions. Converting a circuit from synchronous to asynchronous is one enabler for such supply voltage tailoring. Once a circuit is converted to asynchronous, there is a great deal of flexibility for optimizing supply voltage and performance. An asynchronous GALS technology was used to decouple clocking requirements amongst the individual processors, thereby enabling each processor to run only as fast as necessary. Once clock speeds are reduced the supply voltage can be lowered resulting in significant energy savings. On key project outcome is that the clock frequency and supply voltage for each individual core processor is under direct software control. Applying a novel approach named "Switchable Synchronous to Asynchronous voltage and frequency Scaling" (SSAS), an array can alternate between synchronous and asynchronous operation. The SSAS approach enables scheduling of voltage and frequency adjustments at compile time to optimize power and performance for a given application. At more advanced technology nodes and with an increased number of available cores, such flexibility will enable optimizations of power and performance that are not possible without using the technology proven during the Phase I project. Overall, the Phase I project met the goal of demonstrating that asynchronous architecture can be applied to an already extremely efficient, multi-core processor, resulting in both energy and power savings without sacrificing performance. The proven concept supports development of a new family of programmable, energy efficient, multi-core processors that can be broadly applied once software tools and support packages are developed during the Phase II project.

Project Start
Project End
Budget Start
2013-07-01
Budget End
2013-12-31
Support Year
Fiscal Year
2013
Total Cost
$150,000
Indirect Cost
Name
Nanowatt Design, Inc.
Department
Type
DUNS #
City
Fayetteville
State
AR
Country
United States
Zip Code
72701