CluE: Towards Scalable Primitives for Graph Operations

Zhao, Ben; Agrawal, Divyakant; El Abbadi, Amr; Wolski, Richard; Yan, Xifeng

Abstract

Many of today's data-intensive application domains, including searches on social networks and module searches in biological pathways, require complex queries on large graph datasets. Their highly connected nature means graph operations tend to "crawl" across many links, resulting in very large memory footprints that strain resources on today's commodity servers. In addition, current application platforms lack abstractions for graph operations, leaving developers to implement a variety of complex standalone graph query components. The PIs of the UCSB Massive Graphs In Clusters (MAGIC) project are developing a scalable infrastructure to provide high level abstractions for graph primitives, simplifying design of complex queries while addressing difficult challenges of maximizing data parallelism and adaptive graph partitioning across clusters.

Advances from this project will include data partitioning techniques, novel primitives for graph operations, and a software infrastructure that together expand the applicability of scalable cluster infrastructures such as MapReduce and Dryad to graph-oriented queries in social networks and biological graphs.

Project Report

This project funded three separate but complementary efforts at UC Santa Barbara studying different aspects of social networks, graphs, queries and analysis. In terms of technical contributions, this project has produced significant outcomes in the form of: a) new developments in identifying and mechanisms to search for patterns in large graphs; graph management and graph computing middleware systems for processing extremely large graphs in a distributed computing cluster environment; b) analytical models and results that model and predict the behavior of epidemics in online social networks, particularly those of competing epidemics that conflict with each other; c) probabilistic query processing mechanisms that approximate shortest path computation queries in O(1) time on massive graphs; measurements and analyses of user interactions, fake users (Sybils), and social spam on online social networks such as Facebook, Renren, Twitter etc; measurement-calibrated graph models for generating realistic synthetic graphs. In addition to the academic publications, this project has produced significant impact in terms of both datasets and deployed software. Data measurements and analysis of social models has produced some widely distributed, anonymized graphs from Facebook (distributed to over 130 different research groups at Labs and Universities around the world), as well as synthetic graph generators used by numerous institutions. The project has also produced several pieces of hardware: a) SEDGE is a graph management system which has been downloaded 60+ times in the two weeks it has been available; b) Rigel is a system for approximating shortest path queries, which has been downloaded widely, and has now been adopted and used internally by several social networks, including Zynga, Renren, Google, and LinkedIn.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0847925
Program Officer: Vasant G. Honavar

Project Start
Project End
Budget Start: 2009-04-01
Budget End: 2012-03-31
Support Year
Fiscal Year: 2008
Total Cost: $482,000
Indirect Cost

CluE: Towards Scalable Primitives for Graph Operations
Zhao, Ben Agrawal, Divyakant El Abbadi, Amr Wolski, Richard Yan, Xifeng
University of California Santa Barbara, Santa Barbara, CA, United States

Abstract

Project Report

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Project Report

Funding Agency

Institution

Comments