The objective of the proposed research is to investigate and develop a comprehensive framework for the automatic synthesis of custom processors. The scope of the proposed framework includes all the necessary steps to generate the custom processor's architecture, starting from one or more embedded software programs that it is required to execute. The problems that we will tackle are as follows. We will develop techniques for efficient automatic generation of hardware extensions -- custom instructions and co-processors -- to a base processor platform, for large hierarchical application programs. The complexity of realistic application software, together with the multi-granular nature of the hardware extensions, necessitate the development of new techniques that start from a hierarchical program representation and identify custom instructions and co-processors that maximize performance or minimize energy consumption under given design constraints. The number of individual candidate custom instructions and co-processors can be quite large for even moderately sized programs, and hence the number of combinations thereof is even larger. We will develop techniques to efficiently explore the unified custom instruction and co-processor design space, and select an optimal combination of custom instructions and co-processors (from individual candidates) that maximizes performance or energy efficiency. High levels of hardware re-use are critical for deriving efficient implementations of custom processors. Various custom instructions derived for a given application may exhibit commonality that can be exploited to either reduce the area overhead or improve performance/energy impact under a given area constraint. In addition to conventional fine-grained resource sharing techniques, we will develop new coarse-grained sharing techniques to obtain high quality designs. Software transformations, if appropriately applied to the target application program before attempting to derive hardware extensions, can facilitate the generation of higher quality custom instructions or co-processors, leading to much higher performance and energy gains. We will develop a method to automatically apply a suitable sequence of enabling transformations to the application.

Efficiency and flexibility are two major requirements driving embedded system design. Unfortunately, these two requirements are typically at conflict with each other - performance and energy efficiency are often obtained by hardwiring functionality and optimizing the system in an application-specific manner, which limits flexibility, while flexibility is obtained through configurability and/or programmability, which carry associated overheads. Negotiating this tradeoff is critical in a wide variety of applications, ranging from high-performance systems to battery-driven systems.

Application-specific instruction set processors (ASIPs) offer a good tradeoff between efficiency and flexibility by realizing only the critical operations in the application(s) of interest using custom hardware. Conventional approaches to ASIP design are based on designing and implementing a new instruction set and processor architecture from scratch for each application. Unfortunately, the design turnaround time for such approaches may be large and is comparable to design cycles for custom hardware implementations. The recent evolution of customizable and extensible processor technology, such as Tensilica's Xtensa and ARC's ARCTangent processor cores, has provided embedded system designers with a mechanism to design ASIPs with rapid turnaround times through the use of re-targetable software development tool flows, and configurable soft intellectual property (IP). However, the task of customizing the processor and extending it with custom hardware (instruction units, co-processors, peripherals) are still largely manual and left to the designer's expertise. In order to realize the potential for energy efficiency as well as flexibility that ASIPs offer, it is necessary to develop high-level methodologies that automatically identify application hot-spots and map them to custom hardware that extends the underlying configurable platform.

No algorithms or tools exist for extensible processor platforms that address any of the above problems. We have taken the first step in this direction by developing algorithms and a tool to automatically identify custom instructions for extensible processors. This tool results in an average performance improvement of 3.4X (up to 5.4X) and an average energy-delay product improvement of 12.6X (up to 24.2X).

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0310477
Program Officer
Sankar Basu
Project Start
Project End
Budget Start
2003-08-15
Budget End
2006-07-31
Support Year
Fiscal Year
2003
Total Cost
$306,000
Indirect Cost
Name
Princeton University
Department
Type
DUNS #
City
Princeton
State
NJ
Country
United States
Zip Code
08540