In this project, we are building a graph data management system and a suite of tools aimed at supporting real-time, historical, and analytics queries over very large, dynamic, heterogeneous, and noisy information networks. Examples of such information networks include social networks, communication networks, financial transaction networks, citation networks, gene regulatory networks, disease transmission networks, ecological food networks, sensor networks, social contact graphs, and many more. Network data is most naturally represented as a graph, with nodes representing the entities and edges denoting the interactions between them. There is, however, a lack of established data management systems that provide declarative frameworks for querying and analysing such graph-structured data, especially very large volumes of heterogeneous, complex-structured, and rapidly changing data.

In this project, we are developing a set of formalisms that include: (a) a declarative query language for graph data, (b) a declarative framework for specifying complex, iterative network analysis tasks like entity resolution, link prediction, etc., and (c) a general-purpose neighborhood-centric distributed programming framework. Our declarative interfaces and the programming framework are based on "Datalog", a well-established database query language, providing the users or the analysts a consistent abstraction of the graph data to specify their queries or tasks.

We are designing a suite of techniques, algorithms, and index data structures, to efficiently store large volumes of time-evolving graph data, and to execute queries and analysis tasks over it. We are addressing the challenges in minimizing network communication overhead during distributed computation through designing new partitioning and adaptive replication techniques. We are also developing a compression-based approach to minimize the resources needed for graph processing, and a framework for extrapolating missing historical information to enable querying over incomplete historical traces.

Managing and reasoning about graph data is increasingly becoming crucial in many real-world application domains including social media, e-science, disease epidemics, and financial markets, to name a few. The frameworks and tools that we are developing make it easier and more intuitive for domain experts and analysts to process, analyze, and extract insights from large volumes of dynamic time-evolving graph data. Our system enables temporal evolutionary analytics over very large historical traces, and continuous and real-time analytics over highly dynamic graphs, thus enabling a rich class of applications that would not have been possible before. The declarative frameworks and the query language that we are developing have the potential to transform and streamline the highly fragmented research area of graph query processing and analytics. This project provides research opportunities for graduate and undergraduate students, and is aligned with several undergraduate and graduate courses offered by the PI. For further information, see the project web site at: www.cs.umd.edu/~amol/GrDB

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1319432
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-09-01
Budget End
2017-08-31
Support Year
Fiscal Year
2013
Total Cost
$500,000
Indirect Cost
Name
University of Maryland College Park
Department
Type
DUNS #
City
College Park
State
MD
Country
United States
Zip Code
20742