Resource Description Framework (RDF) has, in recent years, become an increasingly important data and knowledge representation formalism for a broad range of applications, including the World Wide Web. With rapidly growth in the size of RDF datasets, there is growing need for scalable and efficient technologies for storing, indexing, and querying RDF datasets that are trillions of triples in size. This project, led by Dr. Praveen Rao of University of Missouri-Kansas City, aims to address this need by developing: (1) A novel approach to storing, indexing, and querying of RDF data that treats graphs as first-class citizens to reduce the cost and number of joins required for graph pattern matching using RDF signatures, RDF signature indexes and line graphs; (2) A new approach for parallel SPARQL query processing on cloud platforms using data distribution schemes based on RDF signatures, location index for quickly finding RDF graphs of interest across computing nodes, and a gossip-driven query execution model and (3) A new approach for selectivity estimation of RDF graph patterns for query optimization based on new gossip algorithms for cardinality estimation of RDF graph patterns and a divide-andconquer method for effective load balancing and improved accuracy.
The broader impacts of this project include new courses covering topics in RDF data management and cloud computing, a scalable RDF reasoning tool over cancer data for oncologists, new cloud services for very large RDF data stores, increased opportunities for research-based advanced training of undergraduate and graduate students, including women. The results of this research, including publications, software, and data sets will be freely shared with the broader community. Additional information about the can be accessed through the project website at http://vortex.sce.umkc.edu/ric.html.