File Structuring and Information Retrieval for Large Full Text Libraries

Buckley, Christopher; Salton, Gerard

Abstract

9300124 Salton This is the first year funding of a three-year continuing award. The processing of large collections of heterogeneous text is addressed where both the text lengths and the subject matter vary widely. In such circumstances, the integrity of full documents cannot be maintained, but access needs to be provided to individual text excerpts in accordance with specific user requirements. This project is concerned with three main aspects relating to the design and operations of a flexible full-text environment: the content analysis and text indexing for heterogeneous data in unrestricted subject areas; the generation of linked text structures of many kinds where similar text excerpts are jointly accessible by following the text links; and the implementation of sophisticated text utilization methods in such a structured text environment. Corpus-based text analysis methods are under development based on sophisticated text matching algorithms that account both for existing global vocabulary coincidences between different texts, and for similarities in the local environment in which the vocabulary is used. When similar words are used in similar local contexts, the meanings are normally congruent. Hypertext structures of relatable text excerpts must then be generated at various levels of detail, and these linked text structures must be utilized for flexible text traversal, and the recognition of text themes, and the automatic construction of text abstracts and summaries. The availability of large structured text environments will vastly improve the manipulation of collections of full text, and open up many areas of application in text transformation and text use. ***

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 9300124
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 1993-08-01
Budget End: 1997-07-31
Support Year
Fiscal Year: 1993
Total Cost: $208,815
Indirect Cost

File Structuring and Information Retrieval for Large Full Text Libraries
Buckley, Christopher Salton, Gerard
Cornell University, Ithaca, NY, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments