Ongoing technology trends are accelerating scientific discovery by allowing researchers to generate enormous quantities of data, in domains ranging from computational biology to social networks. There is an urgent need to make it easy and fast to extract useful content from this data using appropriate abstractions and parallel runtimes. Work conducted under this project aims to make "big data" computing more readily available to applications with dynamic structure and irregular dependencies, thereby enabling advances in scientific computing in general and computational biology in particular.
This project extends the state of the art in scientific computing by developing programming abstractions to expose -- and run-time optimizations to exploit -- the parallelism available in large, irregular applications. Parallelism is essential for the extraction of useful information from ever increasing volumes of scientific data, but the irregularity of data structure and access in many problem domains makes efficient parallelization difficult. At the level of the programming model, the project addresses the challenge of irregularity by identifying design patterns for important new classes of applications -- in particular, those that use trees and graphs for data representation and access but demonstrate some structure in the traversal. At the level of the run-time system, it is developing computational engines that support and exploit the new patterns, leveraging the structure exposed to automatically and dynamically map computational tasks to hardware nodes.