Unsupervised Document Set Exploration Using Divisive Partitioning

Boley, Daniel

Abstract

The purpose of this project is to develop algorithms and tools for the exploration and categorization of extremely large bodies of documents, especially from the World Wide Web. The technical approach is based on a new hierarchical divisive partitioning method which has produced quality clusters very fast in preliminary tests. The research issues to be addressed include: scalability analysis, theoretical foundations, incremental updating methods, generalizations (such as handling missing values and different scaling), and interface to one or more Web agents for various applications. Educational seminars and tutorials are a natural part of this project, given its interdisciplinary nature. Anticipated results are a set of algorithms and tools for organizing large document collections that enjoy the features of (1) scalability to very large datasets, (2) unsupervised operation, and (3) reasonable quality and usefulness of the categories found. Anticipated benefits include an order of magnitude increase in the size of datasets on which it will be practical to extract useful categories in an unsupervised manner. Potential applications include client-side WWW organization and search aids, server-side aids to create document ratings in a consistent manner, tools to maintain and update organization and classification of contents of specialized databases, all with a minimum of human intervention. www.cs.umn.edu/~boley/PDDP.html

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 9811229
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 1998-09-15
Budget End: 2002-08-31
Support Year
Fiscal Year: 1998
Total Cost: $185,019
Indirect Cost

Unsupervised Document Set Exploration Using Divisive Partitioning
Boley, Daniel
University of Minnesota Twin Cities, Minneapolis, MN, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments