The ability to locate important phrases in natural language text is useful for the purposes of indexing or placing hyperlinks in text. In either case one seeks to improve access to the textual material. In the past the most common method used for the location of phrases has been a part of speech tagger. We have developed a new approach that uses scoring algorithms to rank phrases as to how useful they may be. A number of different methods have been developed and tested. These are being combined with methods of stemming and of finding inflectional variants of phrases that are synonymous for retrieval purposes. The UMLS system is also being used to find synonymous phrases for indexing. These methods are being applied to find useful phrases in NCBI's electronic textbook project that is currently online but still under development.
Yu, Hong; Kim, Won; Hatzivassiloglou, Vasileios et al. (2007) Using MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles. J Biomed Inform 40:150-9 |
Wilbur, W John; Kim, Won; Xie, Natalie (2006) SPELLING CORRECTION IN THE PUBMED SEARCH ENGINE. Inf Retr Boston 9:543-564 |
Wilbur, W John; Rzhetsky, Andrey; Shatkay, Hagit (2006) New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinformatics 7:356 |
Kim, Won; Wilbur, W John (2005) A strategy for assigning new concepts in the MEDLINE database. AMIA Annu Symp Proc :395-9 |
Smith, L; Wilbur, W J (2004) Retrieving definitional content for ontology development. Comput Biol Chem 28:387-91 |
Yeganova, L; Smith, L; Wilbur, W J (2004) Identification of related gene/protein names based on an HMM of name variations. Comput Biol Chem 28:97-107 |
Smith, L; Rindflesch, T; Wilbur, W J (2004) MedPost: a part-of-speech tagger for bioMedical text. Bioinformatics 20:2320-1 |
Smith, L; Yeganova, L; Wilbur, W J (2003) Hidden Markov models and optimized sequence alignments. Comput Biol Chem 27:77-84 |
Aronson, A R; Bodenreider, O; Chang, H F et al. (2000) The NLM Indexing Initiative. Proc AMIA Symp :17-21 |
Kim, W; Wilbur, W J (2000) Corpus-based statistical screening for phrase identification. J Am Med Inform Assoc 7:499-511 |
Showing the most recent 10 out of 11 publications