This proposal extends current research in creating a query language for provenance to a broad range of application domains by designing, developing, and evaluating a general-purpose query language for graph-oriented data.

PQL, the path query language, was designed to address the challenges encountered in expressing queries on provenance or lineage data, but it was conceived to be the foundation for a general purpose data model and query language for manipulating any type of graph-oriented data. Graph-oriented data arises in many disciplines including computer networking, information retrieval, biology, web search, social networking, genealogy, etc. A characteristic that unites all these domains is that need for expressing queries on paths through a graph. Most existing solutions today have either a weak or non-existent notion of paths as first-class entities that can be named, compared, manipulated and constructed. PQL addresses this problem.

Derived from the semi-structured database language, Lorel, PQL operates on semi-structured data, which can be viewed as a collection of objects linked together. In PQL, these links are unidirectional, although we support both forward and backward queries across these links. Queries are expressed by selecting and filtering one or more paths in the graph, where paths can be described by regular expressions. Thus, in the provenance domain, one can talk about, ''all paths in the graph between invocations of a compiler and the resulting executables.'' In biology, one might pose a query about, ''all paths from a particular combination of gene expressions to resulting insulin production.''

The work described in this proposal extends PQL to include update syntax and semantics and the ability to produce query results that are graphs constructed from components of the original. The result of this work will be captured in both a machine-checkable formal specification of the language and an open source PQL implementation, complete with a (replaceable) back-end implementation.

Further information on the project can be found at the project web page: www.eecs.harvard.edu/~syrah/pass/

Project Start
Project End
Budget Start
2008-09-01
Budget End
2009-08-31
Support Year
Fiscal Year
2008
Total Cost
$130,000
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138