Theme recognition in document collections.

Wilbur, Willy

Abstract

Many different methods have been investigated for the purpose of clustering sets of documents with the hope of improving retrieval. Unfortunately these have generally failed to provide improved retrieval capability. Part of the problem is clearly the fact that a given document often involves more than one subject so that it is not possible to make a clean categorization of the documents into definite categories to the exclusion of others. In order to overcome this problem we have developed methods that are designed to identify a theme among a set of documents. The theme need not encompass the whole of any document. It only needs to exist in some subset of the documents in order to be identifiable. Some of these same documents may participate in the definition of several themes. The method of finding themes is based on the EM algorithm and requires an iterative procedure which converges to themes. This is in the testing stage of development and several models are being considered. - expectation maximazation Bayesian statistics

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Intramural Research (Z01)
Project #: 1Z01LM000089-01
Application #: 6228046
Study Section: Special Emphasis Panel (CBB)

Project Start
Project End
Budget Start
Budget End
Support Year: 1
Fiscal Year: 1999
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country: United States
Zip Code

Related projects


NIH 2008 Z01 LM	General and Semi-supervised Machine Learning Applied to Bioinformatics Wilbur, Willy John / National Library of Medicine	$86,215
NIH 2007 Z01 LM	Theme Recognition In Document Collections. Wilbur, Willy John / National Library of Medicine	$52,962
NIH 2006 Z01 LM	Theme Recognition In Document Collections Wilbur, Willy John / National Library of Medicine
NIH 2005 Z01 LM	Theme Recognition In Document Collections Wilbur, Willy John / National Library of Medicine
NIH 2004 Z01 LM	Theme Recognition In Document Collections Wilbur, Willy John / National Library of Medicine
NIH 2003 Z01 LM	Theme Recognition In Document Collections Wilbur, Willy John / National Library of Medicine
NIH 2002 Z01 LM	Theme Recognition In Document Collections Wilbur, Willy John / National Library of Medicine
NIH 2001 Z01 LM	Theme Recognition In Document Collections Wilbur, Willy John / National Library of Medicine
NIH 2000 Z01 LM	Theme recognition in document collections. Wilbur, Willy John / National Library of Medicine
NIH 1999 Z01 LM	Theme recognition in document collections. Wilbur, Willy John / National Library of Medicine

Publications

Kim, Won; Wilbur, W John (2005) A strategy for assigning new concepts in the MEDLINE database. AMIA Annu Symp Proc :395-9

Shatkay, H; Edwards, S; Wilbur, W J et al. (2000) Genes, themes and microarrays: using information retrieval for large-scale gene analysis. Proc Int Conf Intell Syst Mol Biol 8:317-28

Comments

Be the first to comment on Willy Wilbur's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: