The World-Wide Web has the potential of being the world's largest knowledge base if its unique characteristics are properly utilized. While superficially similar to early hypertext and other information retrieval domains, the Web has important quantitative and qualitative differences that necessitate new tools and techniques. The quantitative differences are obvious: The tremendous size of the Web precludes storing a complete and up-to-date copy in a conventional database. The qualitative differences between the Web and earlier information retrieval domains are more subtle but no less important, including the various types of semi-structured information on the Web. Unifying the Web with relational databases, so it can be accessed through Structured Query Language (SQL) and other database tools, would apply one of the most popular data management tools to this challenging and important domain. A prototype implementation of just-in-time databases and the Squeal query environrnent appeared in the investigator's dissertation, proving that the Web could be queried as though it were in a local relational database. This was achieved by automatically retrieving information from the Web only when needed, allowing the construction of several simple but powerful structure-based information retrieval applications.

This proposal is to build a powerful and efficient relational database interface to the Web that will be a valuable tool for the information retrieval community, supporting uniform access to many types of structural and other information on the Web, including inter-document structure, intra-document structure, forms, cookies, and new features as they arise. Efficient performance will require developing query optimization techniques, pre-fetch and refresh strategies, and techniques for sharing data among multiple users. As the system progresses, novel information retrieval applications will be developed and evaluated, to gain a better understanding of the benefits and limitations of using different types of available information toward various information retrieval problems. Throughout the process, the system itself will be evaluated, to better understand the powers and limits of providing a relational database interface to the Web.

The primary educational goal is to increase computer science opportunities for women, an area in which the investigator has extensive experience and is well-positioned as a faculty member at a women's college with a strong undergraduate program and two graduate programs: a certificate program for people with college degrees who wish to change fields and an interdisciplinary master's program for combining computing with another area of experience. The investigator's plans (already in progress) include:

Supervising undergraduate research, with the goal of preparing and motivating students for graduate education.

Completely revising the architecture and operating systems sequence, including the creation of a hardware laboratory.

Taking advantage of the school. Bay Area location to forge ties with industry leading to speakers, internships, permanent employment, and possible donations.

Running a mentoring program for Mills students to prepare them for graduate school and employment.

Recruiting transfer students from nearby community colleges, which has the added benefit of reaching economically underprivileged women.

Improving the interdisciplinary master's program to more fully integrate students' other areas of interest with computer science.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9876309
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
1999-04-15
Budget End
2004-12-31
Support Year
Fiscal Year
1998
Total Cost
$212,000
Indirect Cost
Name
Mills College
Department
Type
DUNS #
City
Oakland
State
CA
Country
United States
Zip Code
94613