Many of today's data-intensive application domains, including searches on social networks and module searches in biological pathways, require complex queries on large graph datasets. Their highly connected nature means graph operations tend to "crawl" across many links, resulting in very large memory footprints that strain resources on today's commodity servers. In addition, current application platforms lack abstractions for graph operations, leaving developers to implement a variety of complex standalone graph query components. The PIs of the UCSB Massive Graphs In Clusters (MAGIC) project are developing a scalable infrastructure to provide high level abstractions for graph primitives, simplifying design of complex queries while addressing difficult challenges of maximizing data parallelism and adaptive graph partitioning across clusters.

Advances from this project will include data partitioning techniques, novel primitives for graph operations, and a software infrastructure that together expand the applicability of scalable cluster infrastructures such as MapReduce and Dryad to graph-oriented queries in social networks and biological graphs.

Project Report

This project funded three separate but complementary efforts at UC Santa Barbara studying different aspects of social networks, graphs, queries and analysis. In terms of technical contributions, this project has produced significant outcomes in the form of: a) new developments in identifying and mechanisms to search for patterns in large graphs; graph management and graph computing middleware systems for processing extremely large graphs in a distributed computing cluster environment; b) analytical models and results that model and predict the behavior of epidemics in online social networks, particularly those of competing epidemics that conflict with each other; c) probabilistic query processing mechanisms that approximate shortest path computation queries in O(1) time on massive graphs; measurements and analyses of user interactions, fake users (Sybils), and social spam on online social networks such as Facebook, Renren, Twitter etc; measurement-calibrated graph models for generating realistic synthetic graphs. In addition to the academic publications, this project has produced significant impact in terms of both datasets and deployed software. Data measurements and analysis of social models has produced some widely distributed, anonymized graphs from Facebook (distributed to over 130 different research groups at Labs and Universities around the world), as well as synthetic graph generators used by numerous institutions. The project has also produced several pieces of hardware: a) SEDGE is a graph management system which has been downloaded 60+ times in the two weeks it has been available; b) Rigel is a system for approximating shortest path queries, which has been downloaded widely, and has now been adopted and used internally by several social networks, including Zynga, Renren, Google, and LinkedIn.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0847925
Program Officer
Vasant G. Honavar
Project Start
Project End
Budget Start
2009-04-01
Budget End
2012-03-31
Support Year
Fiscal Year
2008
Total Cost
$482,000
Indirect Cost
Name
University of California Santa Barbara
Department
Type
DUNS #
City
Santa Barbara
State
CA
Country
United States
Zip Code
93106