The Semantic Web is an emerging technology that stipulates that the content of each Internet web site provide self-describing metadata encoded using standard graph representations, (using the OWL and RDF computer languages). In the common Internet it is optional for web sites to provide topical descriptions of their content. Further, the optional methods that are provided simply allow developers to list keywords. The methods do not provide a way to detail the meaning of those keywords, i.e. their semantics.
The goal of this project is to develop and demonstrate algorithms that leverage the new, semantic aspects of the Internet and make it much easier to treat multiple web sites and their underlying databases as a single unified database. This is distinguished from the existing Internet where a browser enables people to view documents and data, in the form of documents, from different web sites in a single place. While the use of Internet browsers is now endemic and trivially intuitive, creating computer applications that process data from multiple web sites remains a highly skilled and labor-intensive process.
This project comprises two components. The first component helps create the Semantic Web by developing methods that automatically map existing databases to the Semantic Web graph languages. The methods comprise data mining techniques that discover the semantics already present, but not explicitly encoded in relational databases and recast those semantics in graph-based form. The second component comprises the development of a distributed query execution environment for processing graph structured queries, expressed in SPARQL.
The PI is an invited expert on the W3C working group on standards for relational database to RDF translation (RDB2RDF), the subject of this research. More information on this project can be found at www.cs.utexas.edu/~miranker/SemanticWeb.html.