A long standing dream of Artificial Intelligence has been to create an autonomous learner that continuously increases its knowledge by reading a wide variety of texts. To date, this kind of knowledge acquisition has been attempted only rarely, and typically at small scale. Yet the Web has made a vast library of online text readily accessible.
In response, this project investigates a family of unsupervised, domain-independent, scalable systems that learn from the Web in an open-ended fashion. Such systems not only extract information but also extend their ontology, incorporating new classes and relations. Furthermore, the systems' learning is recursive -- once new relations are learned, the system builds on these to learn new relations, relations between relations, and so on. The project investigates the automatic control of this process, and analyzes both the power and limitations of this form of learning.
The project will impact both the natural language processing (NLP) and machine learning (ML) communities by investigating fundamental issues in learning from text. In addition, the systems developed could lead to a new generation of Web search engines that improve information access, achieving a broad societal and economic impact.