Major social and economic change is being driven by the emergence of "big data." In all sectors of the economy businesses are increasingly relying on the ability to extract useful intelligence from massive relational data sets. Emergent applications are characterized by data intensive computation where massive parallelism is increasingly unstructured, hierarchical, workload dependent, and time varying. At the same time, energy and power considerations are driving computer architecture towards massively parallel heterogeneous organizations such as multithreaded CPUs tightly integrated with bulk synchronous parallel (BSP) architectures such as general-purpose graphics processing units (GPUs). This evolution driven by energy efficiency concerns has had a disruptive impact on modern software stacks challenging our ability to extract the performance necessary to deal with big data. We need to develop computing technologies that can harness the throughput potential of energy efficient heterogeneous architectures for emergent applications processing massive relational data sets.
Realizing the potential of massively-parallel heterogeneous architectures is inhibited by the unstructured dynamic parallelism exhibited by applications in these domains. This research develops a suite of coordinated algorithm, compiler, and microarchitecture technologies that effectively exploits dynamic parallelism. The suite of techniques enables the effective navigation of the tradeoffs between parallelism, locality, and data movement to realize optimized high performance implementations. First, the proposed program utilizes the language of sparse linear algebra to formulate algorithms to expose massive unstructured parallelism. Second, this formulation drives new compiler and run-time system optimizations tailored to the computational characteristics of these emergent applications and heterogeneous hardware. Third, at the microarchitecture level we propose new memory hierarchy management techniques tailored to exploiting dynamic parallelism. The integrated solutions (algorithm, compiler/run-time, and microarchitecture) are demonstrated on commodity platforms and delivered in the form of an open source software stack to support and enable community wide research efforts. For U.S. businesses to exploit the new capabilities of heterogeneous architectures and systems for emerging applications, it is essential to both create new technology and employees with the necessary skills to utilize these technologies. Technology transfer and workforce impact will be promoted through the NSF Industry University Cooperative Research Center on Experimental Research in Computer Systems (CERCS, www.cercs.gatech.edu) at Georgia Tech with members such as Intel, IBM, HP, and AMD as well as application oriented companies such as LogicBlox and Intercontinental Commodity Exchange (ICE) and also Department of Energy National laboratories such as Sandia and Oak Ridge. Similar impacts are expected through the NVIDIA Center of Excellence at Georgia Tech.