The proposal is to support multi-stage queries by integrating knowledge from across different queries, creating a "knowledge mosaic." This approach, if successful could radically improve and extend web search from one of primarily fact-finding to one of fact-finding and problem solving using a set of related queries which find pieces of information which are then "stitched together" such that relationships and links between the information can be better revealed. The approach to problem-solving becomes one of using a set of related queries which find out portions of information on graph models, building the knowledge mosaic using RDF and other graph-based Semantic Web technologies and extending a proposed new query language SPARQ2L The resultant view keeps track of what knowledge has been found, and attempts tto combine it in useful ways to provide a more complex knowledge view for the information the user seeks to discover. The research could have pronounced impact on how search and discovery is performed on the web.

Project Report

Knowledge discovery tasks typically involve iterative data interrogation or querying processes which are not adequately supported by existing querying techniques. In the first place. traditional query primitives support a "pattern matching" paradigm where a user knows a pattern of interest and wishes to find all occurrences of such a pattern. However, knowledge discovery scenarios often involve gaps in user knowledge so that complete patterns are unknown. Further, many kinds of questions that need to posed on data do not necessarily fit a "find set of matches" paradigm but are more complex. Furthermore, for iterative or multi-staged knowledge discovery scenarios, users accumulate bits of knowledge over their series of queries but lack enough knowledge to "connect-the-dots". Support for the "connecting the dots" task is absent in traditional querying where "queries" are "single shot" events and so do not capture the iterative nature of processes. Consequently, there is need to develop advanced query models that support the creation of a Knowledge Mosaic by connecting dots across queries that are part of a knowledge discovery task. Intellectual Merit Semantic Web data models explicitly represent typed relationships, enabling the possibility of highlighting "meaningful connections" to users, rather than just connections. Consequently, Semantic Web data models were chosen as the context of investigating the problem of Mosaic-oriented querying paradigms. The Mosaic project explores different notions of "connecting-the-dots" across queries in a discovery task depending on whether the queries are structured or unstructured (keyword queries) on RDF data. The second major focus of the Mosaic project is to tackle the limited nature of the mainstream interrogation model (pattern matching) in knowledge discovery tasks by developing new richer query models that allow more complex questions to be posed. The outcomes of the research revolve around three major themes: (i) Indexing and storage for complex path queries on RDF graphs: Since path computation is a natural way of connecting the dots in graph models such as RDF, techniques for dealing with the poor I/O access patterns of graph navigation were developed based using a linear algebraic foundation for solving path problems. (ii) Context-Aware Keyword Search Using Query History: this work considered the idea of connecting previous queries to future queries where queries are unstructured (keyword) queries. In a sense, future queries become "context-aware" by being aware of their earlier querying context. The connecting-the-dots model here uses a dynamic weighted graph model for capturing previous queries as weights that evolve as more queries are added to the querying context. Several contributions include the formal model, algorithms for optimization and distributed architecture for scalability. (iii) Skyline Package Queries: this work developed techniques for support the domain of queries as the powerset of a set of resources which returns elements of the powerset as "packages". This allows for richer and more complex query classes than the traditional set query domains. Broader Impact. The results of these efforts were disseminated as several publications and demos in top conferences such as the International World Wide Conference and International Semantic Web Conference as well as journal publications. One of the papers earned the best paper award for the Joint Semantic Technology conference 2012. Our work on Scalable Context Aware Search received coverage on technical news sites like Science360 (NSF news site), Engadget, Science Daily, E!Science News and several others and formed the basis for an invention disclosure which is currently undergoing review. The project provided opportunities for training seven graduate thesis students including two primary and five auxillary thesis students out of which four were female. In addition to the publications produced this project led to 2 doctoral dissertations and 1 Masters thesis. Students went on to be employed by Yahoo, Microsoft and IBM. The project also provided opportunities for training a broader group of graduate students when research was integrated with PI’s instruction (a graduate seminar class). Students in these classes explored research projects related to compressed path indexing schemes for solving path problems on Semantic Web graphs; Scalable architecture for personalized context management for keyword search. This exposure to research to broader graduate community led two students who were not originally enrolled in thesis oriented degrees to convert to thesis oriented degrees (2 Ph.D, 1 MS ) in order to continue the research.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0915865
Program Officer
Vasant G. Honavar
Project Start
Project End
Budget Start
2009-09-01
Budget End
2014-01-31
Support Year
Fiscal Year
2009
Total Cost
$477,703
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695