CAREER: Building and Searching a Structured Web Database

Cafarella, Michael

Abstract

This project investigates techniques for extracting and searching Web-embedded structured datasets. For example, a manufacturer's site may contain technical product data, and a governmental site may contain economic statistics. Unfortunately, such data can be hard to isolate from surrounding text, and difficult to find using existing search engines that focus exclusively on documents. The approach for the extraction step is to use current incomplete datasets to induce a large "portfolio" of possible extractors, apply all of them to crawled Web content, then test which are most successful. The approach for the search step is to examine user query logs to find common patterns that describe the relationship between topic words and words that describe the dataset's structure; e.g., "endangered species near the Mississippi River" is a prototype for a many-to-many geographic relationship. The central goal of this work is to eventually construct a working search engine for the structured-data component of the Web.

The success of this project is likely to increase access to structured datasets for a very broad population of users. The project will also yield a large amount of novel extracted data relevant for scientific research, plus useful tools and query logs. To accompany the research program, this project involves an educational plan that includes revised undergraduate course material, development of online educational material surrounding the datasets and tools, and a course on Web topics taught to a local rural high school. All project results will be distributed at the project's Web site (www.eecs.umich.edu/~michjc/structuredweb/index.html).

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1054913
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 2011-03-15
Budget End: 2017-02-28
Support Year
Fiscal Year: 2010
Total Cost: $504,548
Indirect Cost

CAREER: Building and Searching a Structured Web Database
Cafarella, Michael
Regents of the University of Michigan - Ann Arbor, Ann Arbor, MI, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments