Unsupervised, Non-stop Extraction of Information from the World Wide Web

Etzioni, Oren; Soderland, Stephen

Abstract

A long standing dream of Artificial Intelligence has been to create an autonomous learner that continuously increases its knowledge by reading a wide variety of texts. To date, this kind of knowledge acquisition has been attempted only rarely, and typically at small scale. Yet the Web has made a vast library of online text readily accessible.

In response, this project investigates a family of unsupervised, domain-independent, scalable systems that learn from the Web in an open-ended fashion. Such systems not only extract information but also extend their ontology, incorporating new classes and relations. Furthermore, the systems' learning is recursive -- once new relations are learned, the system builds on these to learn new relations, relations between relations, and so on. The project investigates the automatic control of this process, and analyzes both the power and limitations of this form of learning.

The project will impact both the natural language processing (NLP) and machine learning (ML) communities by investigating fundamental issues in learning from text. In addition, the systems developed could lead to a new generation of Web search engines that improve information access, achieving a broad societal and economic impact.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 0535284
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 2006-03-01
Budget End: 2009-02-28
Support Year
Fiscal Year: 2005
Total Cost: $492,000
Indirect Cost

Unsupervised, Non-stop Extraction of Information from the World Wide Web
Etzioni, Oren Soderland, Stephen
University of Washington, Seattle, WA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments