In a heterogeneous execution model, a general-purpose processor is accelerated by a special-purpose co-processor. For many applications, this approach can yield significant performance improvement at a relatively low cost. The most significant challenge for heterogeneous computing is the task of matching an arbitrary program to an effective co-processor architecture. This project investigates top-down design automation techniques for analyzing and adapting existing software to the heterogeneous execution model.
When adapting scientific software to a heterogeneous computing platform, the software?s most expensive computation is performed on the co-processor, which is usually a custom-designed architecture implemented on an FPGA. This computation, referred to as the kernel, is usually a well-known numerical method or signal transformation. Scientific applications that rely on more exotic or obscure algorithms are rarely adapted for heterogeneous execution. One reason for this is that such applications may not have a well-defined kernel computation, making it difficult to determine which portions of the software, when mapped to the co-processor, will result in the highest overall performance improvement. Another reason is that manually designing special-purpose hardware that performs complex, iterative behavior requires a high level of design effort as well as a high level of expertise in both hardware design and in the specifics of the target application. To address these problems, this research develops a set of systematic techniques for analyzing the runtime behavior of software to determine which components of the software perform the most computation using the least volume of input and output data. The results of this analysis are used to perform hardware/software partitioning and to resolve dynamic memory references. In the next step, a compiler back-end will generate a finely-parallelized co-processor architecture.