Graph analytics form a canonical Big Data problem that is of significant value to the long tail of science, from social sciences to genomics. While graph algorithms for big-memory machines abound, they are inaccessible to the wider community. Developing appropriate abstractions for graph applications on distributed cyber-infrastructure like Clouds and commodity clusters has been challenging. This work explores a subgraph-centric approach which offers the potential for an order-of-magnitude performance benefit. This work investigates graph algorithms, focused on de novo plant genome sequencing, that use a scalable subgraph-centric graph programming model for Clouds. It offers a novel research direction that can profoundly impact next-generation genome sequencing in addition to other domains where graph abstractions can be employed. It catalyzes research into distributed graph analytics through a critical mass of subgraph-centric algorithms, mitigating the lost opportunity cost in delayed adoption of the technology and domain specific computing abstractions. In the process, it will fundamentally advance scalable graph processing to rapidly accelerate and democratize cyber-infrastructure for Big Data for next generation sequencing.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1355377
Program Officer
M. Mimi McClure
Project Start
Project End
Budget Start
2013-10-01
Budget End
2015-09-30
Support Year
Fiscal Year
2013
Total Cost
$99,462
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089