It is often the case that the time and effort required to develop effective and efficient software on high-end computing systems is the bottleneck in many areas of science and engineering. This project is building a novel middleware framework called Global Graphs that targets this bottleneck. Global Graphs takes a data-structure centric view of shared data where graph-based dynamic data structures drive the development of the rest of the system.
A key scientific outcome of this proposed framework is to allow the programmer to have multiple views of the shared data as well as multiple views of the control and tasking model. This flexibility can be leveraged along a discrete scale of data and process views depending on whether the goal is to develop a quick prototype for validating ideas on small scale problems, or the goal is efficient realization on large scale problems, or something in between these two extremes. An additional outcome will be the development of a performance feedback engine that will provide the programmer insights into parts of the program to focus on for performance tuning.
The proposed work has important implications for a range of domains requiring the processing of large scale datasets, including data mining, scientific computing and XML data management. The broader outcomes of this work will be to train capable undergraduate and graduate students. The PIs are actively encouraging under-represented minorities to participate in this effort.
The ubiquity of parallel processor cores in current computing systems, heterogeneity in the form of high-performance accelerators like GPUs, and the rapidly changing nature of computer architecture make the task of developing and maintaining high-performance software extremely challenging. Existing programming models for high-performance computing reflect the characteristics of the underlying architecture and thereby enable high performance, but they are very tedious and error-prone to use. The task of developing efficient parallel applications is especially daunting for algorithms featuring unstructured computation on graph data structures. In this project, a user-friendly, high-level interface called Global Graphs was developed for complex distributed structures such as trees and graphs. Global Graphs provides a convenient global address view of the data structures, while still enabling scalable performance on distributed memory systems. The system utilizes a combination of novel representation (based on graph sparsification) targeting novel characteristics of modern applications (e.g. power-law nature of social network graphs), stratified data placement, coarse-grained data movement to enhance locality and communication efficiency, and a novel runtime system that facilitates work-stealing as a means to ensure scalable performance commensurate with advances in technology. The work associated with this project has led to publicly distributed software, and algorithms and articles that have made a significant impact, ranging from algorithms for processing dynamic networks to processing directed graphs. The effectiveness of the system was evaluated on a range of applications from keyword search algorithms (commonly found on web-search platforms) to social media applications and from Quantum Monte Carlo/gravitational simulations to data compression. The project has provided partial or full financial support to a number of students, and has enabled research training of four undergraduate students, four Masters students and six Ph. D. students; three women and one African-American student have contributed to the project.