With technology scaling and increasing integration density in the nanometer technology regime, design considerations for yield and reliability have become critical. The objective of this collaborative research is to explore low-overhead formal design methodology with distributed micro-scale sensor network and systematic feedback control to achieve auto-curing of digital, analog and mixed-signal electronic systems under large process and temporal variations. Such auto-curing approaches will play a key role in preventing yield loss for nanoscale designs, while ensuring reliability of operation and low power dissipation. The research investigates self-curing concepts/techniques for logic circuits, digital signal processing (DSP) units, embedded memory and analog components using appropriate variation sensing and compensation techniques to achieve high yield with optimal power/die-area overhead. It also explores system-level self-curing approaches using global parameter sensor and global controller to determine optimal compensation of mixed-signal cores under power constraint. To realize the curing methodologies in an automatic synthesis environment, the research will aim at developing appropriate Computer-Aided Design tools and a library of self-correcting mixed-signal cores.
If successful, it will help the semiconductor industry deliver complex nanoelectronic systems with high reliability, low power and high yield. The proposed research will integrate education and training through course development, summer research program for undergraduates, and senior project design.
The project has developed and studied a novel technology platform for variation-tolerant robust system-on-chip (SoC) design through post-manufacturing adaption to process, environmental, workload, and device aging-induced variations. The research results include design methodologies; algorithms for system-level healing, application mapping to reconfigurable hardware frameworks, and design automation; novel hardware architecture to support post-manufacturing adaptation to different types of variations and for different application domains; and evaluation results through circuit-architecture level simulation as well as hardware emulation. The major intellectual contributions of the project are described next in details. (1) System-level Healing Approach: We have observed the effect of post-silicon process compensation or 'healing' of integrated circuits (ICs) to improve yield and reliability under parameter variations. In a SoC comprising of multiple cores, different cores can experience different process shift due to local variations. We have developed an efficient system-level healing solution using a priori design-time information about the relative sensitivities of the cores to system performance and power. We have observed that it can minimize the impact on system power while meeting target specification on output parameters (e.g. video quality). Fig 1(a) shows the block diagram for the main components of an example mixed-signal self-healing SoC used for image compression and Fig. 1(b) shows an example local sensor and control knob for the memory core. (2) Variation Resilience through Operand Truncation: We have developed a novel healing approach, as shown in Fig. 2, and associated algorithm, referred as VaROT that provides a novel, low-overhead approach for post-silicon healing of delay failures. It exploits the fact that in typical DSP datapath modules (such as adder, multiply-and-accumulate units), critical timing paths originate from the least significant bits (LSBs) and they can be shortened by truncating the LSBs – i.e. setting constant values e.g. "0" to these bits. Truncating the input bits, however, affects the output quality but it is found that in common DSP applications, truncating the least significant input bits lead to minimum loss in output quality of service (QoS). It provides a low-overhead approach to healing DSP systems to improve yield under parameter variations. (3) Adaptive error correction for run-time failure tolerance: Post-silicon healing techniques that rely on built-in redundancy are not effective in improving robustness under various run-time failures in memory circuits. Traditionally, a uniform worst-case protection using Error Correction Code (ECC) is used for all blocks in a large memory array for runt-time error resiliency. However, with both spatial and temporal shift in intrinsic reliability of a memory block, such uniform protection can be unattractive in terms of either ECC overhead or protection level. We have investigated a novel Reconfigurable ECC approach, which can adapt, in space and time, to provide the right amount of protection for a memory block at a given time. A circuit-architecture co-design approach has been employed for the encoder/decoder logic to minimize power/performance overhead. (4) Memory-based computing in Field Programmable Gate Arrays (FPGAs): FPGAs are being increasingly used as a preferred prototyping and accelerator platform for diverse application domains, such as signal processing, security, and real time multimedia processing. However, mapping of these applications to FPGA typically suffers from poor energy efficiency due to high energy overhead of programmable interconnects (PI) in FPGA devices. We have explored an energy-efficient heterogeneous application mapping framework in FPGA, where the conventional application mappings to logic and DSP blocks (for DSP-enhanced FPGA devices) are combined with judicious mapping of specific computations to embedded memory blocks (EMBs). Experimental results show that the proposed heterogeneous mapping approach achieves significant energy improvement. Figure 3 shows application mapping steps using EMBs for computation. Broader Impacts: It has supported three PhD students in part to provide tuition and stipend for them and one MS student. The graduate students involved in this project have learnt circuit design, performance/power simulation, algorithm development, and different aspects of signal processing applications. Research activities and results from the project has been used to update the undergraduate and graduate level courses. In particular, hardware architecture for calibration and repair have been used in the graduate-level course Nanometer VLSI Design course (EECS 495) as well as in the undergraduate-level course Computer Architecture course (EECS 314). It has led to several senior projects as well as additional research projects by Case school of engineering undergraduate students. The undergraduate students have worked in in developing the hardware emulator for the memory-based adaptive computing framework. They have been trained on hardware design and application mapping to Field Programmable Gate Array (FPGA). The project has developed four journal papers, five conference papers, one book ("Computing with Memory for Energy-Efficient Robust Systems" by Springer, 2013) and one invited article. It has led to an EDAA Outstanding Dissertation award (2012). It has led to five invited presentations and creation of a Wiki page on computing with memory.