Biomedical literature search is the main entry point for an ever-increasing range of information. PubMed/MEDLINE is the most widely used service for this purpose. However, finding citations relevant to a users information need is not always easy in PubMed. Improving our understanding of the growing population of PubMed users, their information needs and the way in which they meet these needs opens opportunities to improve information services and information access provided by PubMed. One resource for understanding and characterizing patrons of search engines is the transaction logs. Our previous investigation of user query logs has led us to develop and deploy a useful application in assisting user query formulation in PubMed, namely Related Queries (RQ). Inspired by its success, we have continued using log analysis to identify research problems which are closely related to PubMed operations. For instance, through our analysis of PubMed logs, we learn that people search certain biomedical concepts more often than others and that there exist strong associations between different concepts. For example, a disease name often co-occurs with gene/protein and drug names. To this end, we have organized an international challenge event for automatically identifying gene/proteins in full text. Successful techniques presented in the challenge may be used by NCBI to enhance its ability to better link gene records to literature. We have also developed and deployed an automatic method in PubMed to recognize disease concepts in user queries (known as PubMeds disease sensor). Such a sensor provides PubMed users with additional relevant information beyond articles in PubMed. For instance, this particular disease sensor links users to related chapters in GeneReviews, an expert-authored and peer-reviewed book of various genetic diseases. Finally, we have integrated automatic extraction results of disease-drug relations from various authoritative resources (e.g. drug indication field from DailyMed) through the use of standard ontological mappings. Such results will be used to enrich links among different records in NCBIs health related databases. In addition, we have continued developing computational techniques in response to queries that return zero results in PubMed. As shown in our log analysis, about 15% of PubMed searches fall into this category. In some cases there really is no document or abstract that will satisfy a particular query. However, in analyzing one month of queries submitted to PubMed, we find that more often than not, queries that retrieved no results are queries that would retrieve something relevant if they were constructed differently. Based on learning how humans modify unsuccessful queries, we have successfully developed an automatic approach to turning failed queries into successful ones by removing query terms, while maximally preserving the original user search intent.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Library of Medicine
Zip Code
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A et al. (2011) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 39:D38-51
Mork, James G; Bodenreider, Olivier; Demner-Fushman, Dina et al. (2010) Extracting Rx information from clinical narrative. J Am Med Inform Assoc 17:536-9
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A et al. (2010) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 38:D5-16
Lu, Zhiyong; Wilbur, W John (2009) Improving accuracy for identifying related PubMed queries by an integrated approach. J Biomed Inform 42:831-8
Lu, Zhiyong; Wilbur, W John; McEntyre, Johanna R et al. (2009) Finding query suggestions for PubMed. AMIA Annu Symp Proc 2009:396-400
Lu, Zhiyong; Kim, Won; Wilbur, W John (2009) Evaluating relevance ranking strategies for MEDLINE retrieval. J Am Med Inform Assoc 16:32-6
Islamaj Dogan, Rezarta; Murray, G Craig; Neveol, Aurelie et al. (2009) Understanding PubMed user search behavior through log analysis. Database (Oxford) 2009:bap018