Recently, the term Data-Intensive SuperComputing (DISC) has been gaining popularity and includes applications that perform large-scale computations over massive datasets. Because of the increasing volume of data analyzed, the amount of computation involved, and the need for rapid or even interactive response, these applications have an ever increasing demand for computational power. Starting within the last 2-3 years, it is no longer possible to improve processor performance by simply increasing clock frequencies. As a result, multi-core architectures and accelerators like Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) have become cost-effective means for scaling performance. Such architectures are, however, creating a programmability challenge for this class of applications.
This project targets a language-independent compiler and runtime framework for enabling data-intensive applications to be scaled on a variety of modern and emerging highly parallel systems. Specifically, the target will be cluster of multi-core machines, where each node could additionally have an accelerator like a GPU. The system proposed here will built on our prior work on an earlier system, FREERIDE (FRamework for Rapid Implementation of Datamining Engines). Building on the FREERIDE framework, this project has the potential for an impact in the areas of high-end computing, data mining, and scientific data processing.