9300124 Salton This is the first year funding of a three-year continuing award. The processing of large collections of heterogeneous text is addressed where both the text lengths and the subject matter vary widely. In such circumstances, the integrity of full documents cannot be maintained, but access needs to be provided to individual text excerpts in accordance with specific user requirements. This project is concerned with three main aspects relating to the design and operations of a flexible full-text environment: the content analysis and text indexing for heterogeneous data in unrestricted subject areas; the generation of linked text structures of many kinds where similar text excerpts are jointly accessible by following the text links; and the implementation of sophisticated text utilization methods in such a structured text environment. Corpus-based text analysis methods are under development based on sophisticated text matching algorithms that account both for existing global vocabulary coincidences between different texts, and for similarities in the local environment in which the vocabulary is used. When similar words are used in similar local contexts, the meanings are normally congruent. Hypertext structures of relatable text excerpts must then be generated at various levels of detail, and these linked text structures must be utilized for flexible text traversal, and the recognition of text themes, and the automatic construction of text abstracts and summaries. The availability of large structured text environments will vastly improve the manipulation of collections of full text, and open up many areas of application in text transformation and text use. ***

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9300124
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
1993-08-01
Budget End
1997-07-31
Support Year
Fiscal Year
1993
Total Cost
$208,815
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850