Search engines use semantic annotations to provide search results, which remain well hidden behind the familiar search box interface. Beyond web search and recent technologies such as virtual assistants, semantically enriched data appear in a wide spectrum of application domains, including but not limited to, bioinformatics, neuroscience, health care, and social and psychological sciences. Access to semantic data, however, has been restricted to those intimately familiar with Semantic Web Technologies, standards, and protocols, data formats, and query languages. This project aims to substantially reduce the effort and expertise required to access and analyze semantically enriched data, and therefore increase the range of applications that can benefit from such data. Furthermore, this award will support the development of PhD and undergraduate students, and a graduate-level course on the statistical analysis of semantically enriched data at the State University of New York at Albany.
The technical aims of the project are divided into two thrusts. The first thrust will develop a general approach to support simple and intuitive, yet functional, visual semantic querying. In particular, algorithms will be devised to automatically construct semantic queries from keywords provided through a search-like interface. The second thrust will open up the statistical exploration, analysis, and predictive modeling of semantic data. Specifically, new primitives based on ideas from statistics and information theory will be incorporated directly into SPARQL, the query language for retrieving, and discovering relationships from semantic data. An ontology will be designed to support alternative statistical operations. Computational methods and algorithms for query answering in this setting will be developed. These research aims will be complemented by a comprehensive evaluation plan.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.