Testing and Exploiting Clustering for Data Compression

Bookstein, Abraham

Abstract

9307895 Bookstein Testing and Exploiting Clustering for Data Compression This is the first year funding of a three-year continuing award. This project develops techniques for compressing concordances of large, full-text databases. The practical significance is obvious, since concordances are huge, consuming as much resources as the data themselves; yet they are necessary to access the database efficiently. But the theoretical implications are also important, since the highly structured organization of concordances makes them suitable for modeling. In this project clustering in concordances is modeled. Sequential clustering is important in Information Retrieval generally: substantive terms tend to occur together in a document, and documents containing a given term often cluster in a typical database. This project develops and evaluates statistical tests that indicate when clustering is important; identifies measures of sequential clustering strength; and creates models of concordance generation recognizing clustering, improving compression effectiveness. The models studied include Markov models and Bayesian learning models. Sequential clustering is widespread and the results of this research should have implications well beyond data compression, for example analyzing term occurrence to identify content bearing terms for retrieval purposes. Thus this project promises direct benefits in improving our ability to store very large textual databases and, indirectly, in developing methodology of wider interest. ***

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 9307895
Program Officer: Program Director

Project Start
Project End
Budget Start: 1993-08-01
Budget End: 1996-12-31
Support Year
Fiscal Year: 1993
Total Cost: $122,929
Indirect Cost

Testing and Exploiting Clustering for Data Compression
Bookstein, Abraham
University of Chicago, Chicago, IL, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments