The goal of this research is to investigate the relationship between the occurrence of significant topics in a document and the structure of the document. The unique contribution of this research lies in the combination of methods to be used for the production of a list of significant topics, built on both statistical and rule-based techniques for the identification of term variants as a function of their distribution in focus areas in documents. Applications that can employ these methods include information retrieval, passage retrieval, relevance feedback, information extraction, and summarization. The results can be used directly in ongoing research projects on automatic summarization of documents, using both statistical and information extraction techniques, i.e., combining information retrieval (IR) and natural language processing (NLP). To the extent that these techniques are based on linguistically-motivated patterns and not on domain-dependent vocabularies, these patterns should apply to general text. This approach will be applied to several domains to test its generality and applicability across document types. This will permit measuring the cost of porting across genres. Formative and summative evaluation procedures will be developed and performed at each step of the analysis. This research is undertaken in the context of the Digital Library Research Program at Columbia University, in conjunction with the Center for Research on Information Access. The resulting techniques grounded in the novel combination and cross-fertilization of IR and NLP methods are expected to improve information access based on significant topics across domains and genres.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9712069
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
1997-09-15
Budget End
2000-08-31
Support Year
Fiscal Year
1997
Total Cost
$270,314
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027