Graph-structured relational data is ubiquitous throughout the social and physical sciences. Such datasets include personal relations in social and communication networks, protein-protein interactions within an organism, and particle interactions in physics simulations, among many other examples. By studying these graphs, researchers can develop techniques to identify harmful actors in social networks, discover novel protein interaction pathways, and better model interactions within a physical system. However, the large scale and complexity of these datasets makes them particularly challenging to study, and studies often require computationally-expensive analytical techniques. These challenges are further exacerbated by the complexity of the modern large-scale parallel computational systems on which such studies are often performed. Solving these challenges enables the real-time analysis of large-scale constantly-evolving social networks, in-depth studies of full-scale brain neural connectome graphs, and the general application of computationally-intensive analytics to other massive relational datasets. The research in this project presents a set of highly inter-related approaches designed to concurrently address these challenges. Educational initiatives of this project include the development of classes that will introduce students in computer science, the physical sciences, and the social sciences to various aspects of graph theory and computational graph analytics. High school through graduate students are being engaged as contributing members of the project's various research goals. These initiatives are further fostering involvement of students in research, high performance computing, and open source software development.
Specifically, the research in this project is aimed at developing methods to enable complex computations on quadrillion+ edge graphs using current petascale and forthcoming exascale high performance computing platforms. These methods fall under three broad thrusts. The first thrust relates to "Graph Layout", which is the way in which a graph dataset is partitioned, ordered, and stored in-memory and out-of-core on a computational system. An outcome of this thrust is a high quality and scalable means to optimize graph layout under consideration of data type, algorithmic pattern, and hardware platform. The second thrust considers "Architecture-centric Processing" of these datasets under consideration of modern high performance systems. This thrust is researching how to efficiently map complex graph analytic problems to complex heterogeneous architectures, while considering multilevel computational models, asynchronous computations, and various graph layout methodologies. The third thrust involves the "Development of Scalable and Open-source Software" to enable the broader scientific community to easily address the challenges of the prior thrusts as it relates to their specific dataset, analytical problem, and hardware. This thrust is investigating how to best develop software frameworks and toolkits that are designed to scale to the massive (quadrillion+ edge) and irregular power-law graphs arising from these various domains, while efficiently running on next-generation exascale hardware.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.