The use of cloud-based data processing platforms is an increasingly attractive alternative for large-scale data processing. There is active investigation into their use for various types of processing tasks on large-scale unstructured and structured data. However, due to an increased interest in many communities to enable more automatic sharing and exchange of data on the Web using Semantic Web techniques, there is a rapid surge in the availability of very large, real-world, Semantic Web datasets. Such data are semi-structured and have more complex processing requirements than relational data processing due to the fine-grained modeling of data and also the need for inferencing during processing. Consequently, existing optimization techniques for cloud data processing platforms which often adapt relational processing optimization techniques do not address the needs of such workloads. Further, such techniques do not adequately account for the nuances of cloud runtime platforms such as Hadoop e.g., dataflow length as a cost metric, no a-priori existence of indexes and statistics.

This project contributes insight into query optimization requirements for Semantic Web data processing on Map Reduce platforms. Its contributions include a novel Nested TripleGroup data model and Algebra (NTGA), algebraic and dynamic cost query optimization techniques; inter and intra-work sharing techniques, data representation formats and system architecture issues of integrating Semantic Web optimization techniques into frameworks such as Apache Pig. The impact of this project will cut across the increasing range of communities that are aggressively adopting Semantic Web tenets such as, scientific, business, government and other general-purpose communities.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1218277
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2012-09-01
Budget End
2017-06-30
Support Year
Fiscal Year
2012
Total Cost
$446,942
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695