This project addresses current limitations in automatic information extraction technology. Specific objectives are to: 1. use bootstrapping techniques to greatly increase the number of types of entities and relations that can be extracted and the rate at which one is able to create new extractors, 2. improve the performance of supervised training for entity and relation extractors by using bootstrapping to add additional training features and by applying new supervised learning techniques, including new perceptron and discriminative training techniques, 3. address meta-data issues of provenance, confidence, and temporal extent of facts, focussing particularly on the construction of a model of the expected lifetime of facts based on a longitudinal corpus of Web data.

The outcome of the project will be scientific understanding and technology for automatic information extraction from free text, making it possible to convert large document collections into formal databases suitable for automated processing. This will represent a significant enhancement in the utility and societal benefit of digital libraries and the World Wide Web. Project results will be disseminated in the form of publications and publicly available code for information extraction and learning of extractors.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0308370
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2003-06-15
Budget End
2006-05-31
Support Year
Fiscal Year
2003
Total Cost
$185,241
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139