This project investigates web querying techniques for accessing web information resources. The term information resource refers to large web-accessible resources such as the ACM Digital Library. We propose a semantics-based way of accessing a web information resource: extract metadata about topics and relationships from the web resource, extend the metadata with "importance scores", and query it from a database. The query language is extended with constructs (a) to propagate importance scores to the query output to rank query output, and (b) to define "stopping conditions" to reduce query evaluation times. For some query requests, the metadata in the database may not be sufficient to answer queries. Our research direction is to locate more informative query answers by mixing database querying with "focused crawling" in the web information resource, at the algebraic operator level of SQL queries. These queries allow time constraints, and relax the closed world assumption, making it necessary to redefine the notion of well-defined queries. Data extraction techniques will be employed to extract metadata. Variations of our basic approach that do not require direct database query engine changes will be evaluated. Standalone web applications will be developed, and made available.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0312200
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2003-08-01
Budget End
2007-07-31
Support Year
Fiscal Year
2003
Total Cost
$281,000
Indirect Cost
Name
Case Western Reserve University
Department
Type
DUNS #
City
Cleveland
State
OH
Country
United States
Zip Code
44106