PIs: Sandeep K. Shukla (Virginia Tech) & Kenneth Stevens (University of Utah)

The increased momentum towards Globally Asynchronous and Locally Synchronous (GALS) design has been necessitated by three factors. First, due to decreasing feature size of CMOS technology, and increasing clock frequency in the gigahertz range, the clock period is diminishing towards a limit where efficient synchronous clocking throughout the chip is getting difficult. Second, IP reuse based System on Chip (SoC) design becomes easier if all IPs do not have to be optimized for the same clocking scheme, and finally, increasing power consumption in clock buffers, global repeaters and clock tree is a growing industrial concern.

However, correctly designing GALS systems is error prone and difficult. Asynchronous design tools and methodologies are almost nonexistent outside academia with a few exceptions. Research towards tools and methodologies for GALS design is imperative. Given the subtlety and complexity of these problems, we believe that grounding any GALS methodology in formal methods is important and necessary. Real implementation of protocols into on-chip fabrics and experimental validation of the efficacy of such designs is necessary for the wide acceptance of the design methodologies.

In this project, the Virginia Polytechnic and State University and the University of Utah teams will collaborate to develop a formal basis for designing and experimenting with the trade-offs of various GALS solutions, and actually fabricate such solutions on-chip to calibrate the solutions.

Other than the research and educational impact of inventing new GALS techniques, and methodologies for formally capturing the protocols and verifying them, the broader impact of this project will constitute collaborative team work and training of students to work in a distributed development environment, inclusion of undergraduate researchers into research, and special effort of including minority students into the project.

Project Report

Latency insensitive protocol (LIP) provides a synchronous solution for hardware components that needs to be composed in a system-on-chip (SOC). Due to increased clock speed, logic-gate count, and increased chip die sizes, the synchronous clocking of the entire chip is expensive, and power consuming. In this project we addressed the issue of designing LIP solutions for SOC design using existing synchronous components. A straight forward solution proposed early on was to implement handshakes for every inter-component connection. This method then desynchronizes the entire design. However, this kind of over-design is redundant and often results in an SOCs which require larger area, buffer sizes and increased latency, reduced throughput and scalability. In this NSF funded project, we created a front to back design framework for specification, performance analysis, optimization and implementation of converting a synchronous design into a latency insensitive system. This work starts from any synchronous design after floor planning if multi-cycle wiring latencies is detected. We defined partial back pressure graphs (PBPG) as a model for the data flow. Activation of each synchronous component and throughput can be represented in such a graph. PBPG is also used throughout the whole design process as the formal model. Initially, 'stall' signals which implement the back pressure are completely removed which might cause overflow if the PBPG representation of the system has multiple strongly connected components. We then define the strongly connected component graph (SCCG) based on which boundedness can be checked. If the system has overflow, the minimal back pressure necessary to prevent overflow is explored and we designed an algorithm which add minimal number of back pressure arcs (MBPA). We showed that this problem can be reduced to the Minimum Cost Arborescence (MCA) problem for directed graphs which has a polynomial time (thus efficient) solution. An optimized latency insensitive design also requires that back pressure should not affect the throughput. Any resulting throughput reduction can be recovered through buffer resizing. In this framework, two techniques (RMILP) and (LMILP) are invented to find the stringent buffer sized demanded for the throughput improvement, which are more scalable than any prior method for designing latency insensitive design. In the end, the framework refines the optimized formal model (PBPG model) and map it back to circuits by our tool. Our approach provides a formal framework for converting a synchronous model into a latency insensitive implementation with performance optimization in back pressure, throughput and buffer sizes. The outcome of this research is useful for Semiconductor companies that are involved in designing large system-on-chip (such as today's mobile phone chips, integrated graphics and processor chips, and many other system-on-chip designs used in medical devices, consumer electronics, and mission critical applications). The research funded by this grant also spawned many other interesting projects associated with embedded software design methods, software synthesis methods, and composition of software into large software.

Project Start
Project End
Budget Start
2007-07-01
Budget End
2012-06-30
Support Year
Fiscal Year
2007
Total Cost
$242,000
Indirect Cost
City
Blacksburg
State
VA
Country
United States
Zip Code
24061