Linked Open Data (LOD) is rapidly developing into an open data movement to connect a large variety of data across the World Wide Web using standards adopted by the World Wide Web Consortium (W3C). Driven by researchers, government agencies and companies, the resulting Web of Data has grown to over 25 billion RDF triples and is showing exponential growth. However, simply putting collections of data on the Web will be of very limited value. The key to unlocking the value for developing more powerful search, browsing, exploration and analysis is to richly interlink or semantically integrate components of LOD. Given the size, growth rate, heterogeneity and growing areas of coverage, manual semantic integration or interlinking is not practical. Furthermore, current techniques focus on 'same-as' relationship, which is much abused due to limited expressivity. This calls for ways to represent and identify richer and more explicit relationships between different entities that reflect the richness of relations that exist in the real world.

This project develops exploratory techniques to richly interlink components of LOD and then addresses the challenge of querying the LOD cloud, i.e., of obtaining answers to questions which require accessing, retrieving and combining information from different parts of the LOD cloud. Techniques for overcoming semantic heterogeneity include: semantic enrichment through Wikipedia bootstrapping; semantic integration through abstraction by means of upper-level ontologies; and, massively parallel methods for tractable ontology reasoning. Specifically, this research will: (1) identify richer, broader, and more relevant relationships between LOD datasets at instance and schema level (these relationships will promote better knowledge discovery, querying, and mapping of ontologies); (2) realize LOD query federation through an upper level ontology; and, (3) enable access to implicit knowledge through ontology reasoning. The project involves significant risk as it treads new paths in a new terrain, primarily due to the lack of descriptive information (schema) about the data provided by highly autonomous data sources, the significant syntactic and semantic heterogeneity among data originating from independent data sources, and the significantly larger scale, as well as unforeseeable obstacles associated with a rapidly changing and expanding environment.

This project aims to advance the state of the art in semantic integration of large amounts of heterogeneous and autonomously developed or managed data. It seeks to fundamentally transform the landscape of LOD usage because successful LOD querying is a key enabler for a variety of applications. The results of this project could set the stage for the development, and the far reaching adoption, of Semantic Web. The project is integrated with education and research-based advanced training of graduate and undergraduate students. Additional information about the project can be found at: http://knoesis.org/research/semweb/projects/ESQuILO.

Project Report

Linked Open Data (LOD) is rapidly developing into an open data movement to connect a large variety of data across the World Wide Web using standards adopted by the World Wide Web Consortium (W3C). Driven by researchers, government agencies and companies, the resulting Web of Data has grown to over 1000 datasets and is showing exponential growth. However, simply putting collections of data on the Web will be of very limited value. The key to unlocking the value for developing more powerful search, browsing, exploration and analysis is to richly interlink or semantically integrate components of LOD. Given the size, growth rate, heterogeneity and growing areas of coverage, manual semantic integration or interlinking is not practical. Furthermore, current techniques focus on a construct owl:sameAs that is abused due to limited expressiveness, and hence is ineffective or yields poor quality of integration. What is needed is to be able to represent and identify richer and more explicit relationships between different entities, so that the richness of the real world is not crammed inaccurately and inappropriately into very limited types of relationships. At the same time, exponential growth of the LOD in terms of size and diversity creates challenges to identify and analyze datasets for both human and application consumptions. Even though popular datasets such as DBPedia, Freebase, MusicBrainz are well known and widely used in the community, there can be other hidden gems that will be useful for specialized applications. To address the challenges, this project develop exploratory techniques to richly interlink components of LOD, address the challenges of querying the LOD cloud and propose approaches to discover datasets compress and create entity summaries. Specifically we worked on Identifying more expressive relationships such as partonomic relationships among instances in datasets which are well-established, fundamental properties grounded in linguistics and philosophy Identifying alignment among properties in the datasets which is considered to be equally important as concept or instance alignment since properties capture how two concepts and/or instances are related Developing alignment based LOD query federation through an upper level ontology Identifying relevant datasets for a given need at hand by creating automatic domain descriptions for LOD datasets due to the lack of descriptive information (schema) about the data provided by highly autonomous data sources Creating diversified entity summaries to quickly analyze the entities of the datasets Developing lossless compression techniques to compress the large RDF datasets on LOD cloud

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1143717
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2011-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2011
Total Cost
$141,828
Indirect Cost
Name
Wright State University
Department
Type
DUNS #
City
Dayton
State
OH
Country
United States
Zip Code
45435