The purpose of this project is to develop algorithms and tools for the exploration and categorization of extremely large bodies of documents, especially from the World Wide Web. The technical approach is based on a new hierarchical divisive partitioning method which has produced quality clusters very fast in preliminary tests. The research issues to be addressed include: scalability analysis, theoretical foundations, incremental updating methods, generalizations (such as handling missing values and different scaling), and interface to one or more Web agents for various applications. Educational seminars and tutorials are a natural part of this project, given its interdisciplinary nature. Anticipated results are a set of algorithms and tools for organizing large document collections that enjoy the features of (1) scalability to very large datasets, (2) unsupervised operation, and (3) reasonable quality and usefulness of the categories found. Anticipated benefits include an order of magnitude increase in the size of datasets on which it will be practical to extract useful categories in an unsupervised manner. Potential applications include client-side WWW organization and search aids, server-side aids to create document ratings in a consistent manner, tools to maintain and update organization and classification of contents of specialized databases, all with a minimum of human intervention. www.cs.umn.edu/~boley/PDDP.html