Automatic Bayesian Methods In Text Retrieval

Wilbur, Willy

Abstract

Current work on the project is focusing on developing an improved Bayesian classification model and developing new approaches to active learning with a Bayesian model. ? 1) We have found through extensive testing that our version of naive Bayes, a form of MBM (multivariate Bernoulli model), is at least as effective as the MM (multinomial model). The MM model attempts to extract information from local feature counts in text documents. We have developed what we call a Stacked MBM model, which shows that there is not sufficient independent information in the local counts to make a significant improvement in performance. ? 2) We have developed term based active learning methods which provide a different approach to active learning and have shown that they are in many cases more effective then simple uncertainty sampling or error reduction sampling.? 3) We have developed an example selection method that is very powerful in improving Bayes on all of MEDLINE. This is important because there are few methods that can really be applied to all of MEDLINE.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Intramural Research (Z01)
Project #: 1Z01LM000021-15
Application #: 7316226
Study Section: (CBB)

Project Start
Project End
Budget Start
Budget End
Support Year: 15
Fiscal Year: 2006
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country: United States
Zip Code

Related projects

Publications

Wilbur, W John; Kim, Won (2009) The Ineffectiveness of Within - Document Term Frequency in Text Classification. Inf Retr Boston 12:509-525

Lu, Zhiyong; Kim, Won; Wilbur, W John (2009) Evaluating relevance ranking strategies for MEDLINE retrieval. J Am Med Inform Assoc 16:32-6

Lin, Jimmy; Wilbur, W John (2007) PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics 8:423

Wilbur, W John; Kim, Won; Xie, Natalie (2006) SPELLING CORRECTION IN THE PUBMED SEARCH ENGINE. Inf Retr Boston 9:543-564

Kim, W; Wilbur, W J (2001) Amino acid residue environments and predictions of residue type. Comput Chem 25:411-22

Aronson, A R; Bodenreider, O; Chang, H F et al. (2000) The NLM Indexing Initiative. Proc AMIA Symp :17-21

Wilbur, W J (2000) Boosting nai ve Bayesian learning on a large subset of MEDLINE. Proc AMIA Symp :918-22

Wilbur, W J; Neuwald, A F (2000) A theory of information with special application to search problems. Comput Chem 24:33-42

Wilbur, W J; Hazard Jr, G F; Divita, G et al. (1999) Analysis of biomedical text for chemical names: a comparison of three methods. Proc AMIA Symp :176-80

Comments

Be the first to comment on Willy Wilbur's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: