Automatic Bayesian Methods In Text Retrieval

Wilbur, Willy

Abstract

A new model based on the Bayesian approach has been developed which has interesting connections with the vector models of G. Salton. Theoretical details have been worked out. Ideally documents must be indexed by the """"""""real"""""""" objects that they refer to and these real objects become nodes in a system of multiple hierarchies called a specificity network. Each hierarchy is produced by a specificity operator and results in a tree of objects starting at the root with the most general and moving to greater specificity as one progresses towards the leaves. The objects which populate nodes are represented by textual terms or phrases. There may be many representations of any single object. The model described is labor intensive to construct if each document must be converted by hand to a form suitable to represent the objects discussed within it. Thus we are developing methods of automatic extraction of object representations. This will lead to a tractable task to represent documents. Our major effort is to develop machine learning methods that can aid in the construction of the hierarchies described here.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Intramural Research (Z01)
Project #: 1Z01LM000021-13
Application #: 6988448
Study Section: (CBB)

Project Start
Project End
Budget Start
Budget End
Support Year: 13
Fiscal Year: 2004
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country: United States
Zip Code

Related projects

Publications

Wilbur, W John; Kim, Won (2009) The Ineffectiveness of Within - Document Term Frequency in Text Classification. Inf Retr Boston 12:509-525

Lu, Zhiyong; Kim, Won; Wilbur, W John (2009) Evaluating relevance ranking strategies for MEDLINE retrieval. J Am Med Inform Assoc 16:32-6

Lin, Jimmy; Wilbur, W John (2007) PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics 8:423

Wilbur, W John; Kim, Won; Xie, Natalie (2006) SPELLING CORRECTION IN THE PUBMED SEARCH ENGINE. Inf Retr Boston 9:543-564

Kim, W; Wilbur, W J (2001) Amino acid residue environments and predictions of residue type. Comput Chem 25:411-22

Aronson, A R; Bodenreider, O; Chang, H F et al. (2000) The NLM Indexing Initiative. Proc AMIA Symp :17-21

Wilbur, W J (2000) Boosting nai ve Bayesian learning on a large subset of MEDLINE. Proc AMIA Symp :918-22

Wilbur, W J; Neuwald, A F (2000) A theory of information with special application to search problems. Comput Chem 24:33-42

Wilbur, W J; Hazard Jr, G F; Divita, G et al. (1999) Analysis of biomedical text for chemical names: a comparison of three methods. Proc AMIA Symp :176-80

Comments

Be the first to comment on Willy Wilbur's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: