Most of the work on the Semantic Web and more broadly Information Integration assumes that accurate semantic models of sources exist. In practice, although there is a tremendous amount of data available on the Web, there is rarely any semantic description of the sources and the information provided by them. This project will develop a new approach that addresses the problem of automatically discovering and modeling sources by building on the recent development of Linked Open Data within the Semantic Web. The resulting system will be able to learn about any source where there is background knowledge in the Linked Open Data. This work will be a significant advance over what was done previously since it will allow an intelligent system to expand its knowledge to learn models of sources that cover information for which the system has no known sources. The system will start with all of the data and knowledge in the Linked Open Data as well as semantic descriptions of some related services (through the work on Linked Open Services). The system will then learn how a new source relates to the known sources by exploiting the knowledge already available the the Linked Open Data. The result will be a rich semantic description of the individual sources that can be used in Semantic Web applications and information integration systems.
The ability to automatically discover and learn detailed semantic descriptions across a range of sources that go beyond the current source descriptions will greatly expand the utility of Semantic Web and information integration systems. This capability will allow people and systems to better exploit the massive amount of data available today on the Internet and provide a tool to keep up with its growth. Within the bioinformatics world, for example, the amount of data continues to grow rapidly, and the ability to find and structure this data will have a significant impact on ability of researchers to fully exploit all of this information to solve biomedical research questions, such as finding more effective treatments for cancer. More information about the project can be found at www.isi.edu/integration/people/knoblock/projects/prj_source_modeling.html