Biomedical literature search is the main entry point for an ever-increasing range of information. PubMed/MEDLINE is the most widely used service for this purpose. However, finding citations relevant to a users information need is not always easy in PubMed. Improving our understanding of the growing population of PubMed users, their information needs and the way in which they meet these needs opens opportunities to improve information services and information access provided by PubMed. One resource for understanding and characterizing patrons of search engines is the transaction logs. Our investigation of user interactions through one month of PubMed logs focused on analyzing user needs, different aspects of queries (e.g. length), and user search habits. Built on the query log analysis, we further developed an automatic search aid in query formulation, namely Related Queries (RQ). RQ focuses on finding popular queries that contain the initial user search term with a goal of helping users describe their information needs in a more precise manner (i.e. increase of specificity relative to the user input). This aid has been integrated into PubMed since January 2009. Automatic assessment using clickthrough data shows that each day, the new feature is used consistently between 6% and 10% of the time when it is shown, suggesting that it has quickly become a popular new feature in PubMed. Inspired by its success, we are currently experimenting with other state-of-the-art methods for further improving the quality of query suggestions and expanding beyond the query specification. In addition, we are focusing on developing computational techniques in response to queries that return zero results in PubMed. As shown in our log analysis, about 15% of PubMed searches fall into this category. In some cases there really is no document or abstract that will satisfy a particular query. However, in analyzing one month of queries submitted to PubMed, we find that more often than not, queries that retrieved no results are queries that would retrieve something relevant if they were constructed differently. We are currently identifying some of the characteristics of unsuccessful queries and teaching computers to automatically learn the changes that users most often apply in constructing new, corrected queries. Not only can log analysis help PubMed search as a whole, it can play an important role in developing tools for improving the links between different Entrez databases. Through our analysis of PubMed logs, we learn that people search certain biomedical concepts more often than others and that there exist strong associations between different concepts. For example, a disease name often co-occurs with gene/protein and drug names. As a result, we have worked towards the development of different PubMed sensors: automatic tools for recognizing biomedical concepts and building links to related data outside of biomedical literature. This will allow PubMed users to be more readily drawn to related data that could lead to serendipitous discoveries.

Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
2009
Total Cost
$773,992
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A et al. (2011) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 39:D38-51
Mork, James G; Bodenreider, Olivier; Demner-Fushman, Dina et al. (2010) Extracting Rx information from clinical narrative. J Am Med Inform Assoc 17:536-9
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A et al. (2010) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 38:D5-16
Lu, Zhiyong; Wilbur, W John (2009) Improving accuracy for identifying related PubMed queries by an integrated approach. J Biomed Inform 42:831-8
Lu, Zhiyong; Wilbur, W John; McEntyre, Johanna R et al. (2009) Finding query suggestions for PubMed. AMIA Annu Symp Proc 2009:396-400
Lu, Zhiyong; Kim, Won; Wilbur, W John (2009) Evaluating relevance ranking strategies for MEDLINE retrieval. J Am Med Inform Assoc 16:32-6
Islamaj Dogan, Rezarta; Murray, G Craig; Neveol, Aurelie et al. (2009) Understanding PubMed user search behavior through log analysis. Database (Oxford) 2009:bap018