The goal of this project is to develop new algorithms for text categorization, text segmentation, and text summarization based upon a natural language processing technique called `information extraction`. Information extraction techniques provide a level of linguistic analysis that is not supported by word-based information retrieval systems, but are more robust and scalable than in-depth natural language processing techniques. Information extraction is particularly well-suited for text categorization because many categorization problems require the identification of role relationships and contextual distinctions that cannot be captured by keyword analysis. The project involves building a multi-faceted text categorization system that supports multiple text processing capabilities, including multi-class categorization, topic segmentation, and domain-specific text summarization. The objective is to achieve good performance on multiple text corpora and different category sets. This research represents a new approach to text categorization, for which many businesses and government agencies have critical applications including related tasks such as text routing and filtering.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9509820
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
1995-09-01
Budget End
1999-08-31
Support Year
Fiscal Year
1995
Total Cost
$217,526
Indirect Cost
Name
University of Utah
Department
Type
DUNS #
City
Salt Lake City
State
UT
Country
United States
Zip Code
84112