Over the last decade, the online search for biological information has progressed rapidly and has become an integral part of any scientific discovery process. Today, it is virtually impossible to conduct R&D in biomedicine without relying on the kind of Web resources developed and maintained by the NCBI. Indeed, each day millions of users search for biological information via NCBIs online Entrez system. However, finding data relevant to a users information need is not always easy in Entrez. Improving our understanding of the growing population of Entrez users, their information needs and the way in which they meet these needs opens opportunities to improve information services and information access provided by NCBI. One resource for understanding and characterizing patrons of search engines is the transaction logs. Our previous investigation of PubMed query logs has led us to develop and deploy several useful applications in assisting user searches and retrieval such as the query formulation in PubMed, namely Related Queries and Query Autocomplete. Inspired by its success, we have continued using log analysis to identify research problems which are closely related to NCBI operations. Among all Entrez databases, PubMed is the most used one and often serves as an entry point for people to access related data in other Entrez databases. In 2013-2014, we compared two automatic approaches for computing relatedness between journals: one through comparing similar articles published by two journals and the other by comparing articles (in two journals) that were accessed by the same set of users in PubMed query logs. As can be seen, the methods are built on two distinct sources: article content vs. usage. Accordingly, we found that there are significant differences in the results of the two approaches. Furthermore, we compared both methods to a third approach that is based on article citation information. In a case study, the comparison results show that the usage-based method produces results similar to those based on article citation information. This is not unexpected because previous research has suggested correlations between article access usage and citations. Taken together, this research demonstrates that content similarity and usage information in query logs can be complementary to one another in finding related items (e.g. related journals;related articles). The article usage information in query logs could be particularly useful when citation information is not available. In 2011, we studied the PubMed log analysis in terms of its user information needs and search behaviors. One of our main findings was that PubMed users frequently search author names. However, author name ambiguity (e.g. there are multiple authors who share the same name 'Zhiyong Lu'in PubMed) may lead to irrelevant retrieval results. To improve the PubMed user experience with author name queries, an author name disambiguation system based on author profiling and agglomerative clustering was recently developed. In particular, we contributed to the design and evaluation in this project in 2013-2014. When our system was integrated into the PubMed search engine, the overall click-through rate of PubMed users on author name query results improved from 34.9% to 36.9%.

Project Start
Project End
Budget Start
Budget End
Support Year
4
Fiscal Year
2014
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Yeganova, Lana; Kim, Won; Comeau, Donald C et al. (2018) A Field Sensor: computing the composition and intent of PubMed queries. Database (Oxford) 2018:
Fiorini, Nicolas; Canese, Kathi; Starchenko, Grisha et al. (2018) Best Match: New relevance search for PubMed. PLoS Biol 16:e2005343
NCBI Resource Coordinators (2018) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 46:D8-D13
Kim, Sun; Yeganova, Lana; Comeau, Donald C et al. (2018) PubMed Phrases, an open set of coherent phrases for searching biomedical literature. Sci Data 5:180104
NCBI Resource Coordinators (2017) Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res 45:D12-D17
Fiorini, Nicolas; Lipman, David J; Lu, Zhiyong (2017) Towards PubMed 2.0. Elife 6:
Kim, Sun; Fiorini, Nicolas; Wilbur, W John et al. (2017) Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. J Biomed Inform 75:122-127
NCBI Resource Coordinators (2016) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44:D7-19
Huang, Chung-Chi; Lu, Zhiyong (2016) Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation. Database (Oxford) 2016:
NCBI Resource Coordinators (2015) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 43:D6-17

Showing the most recent 10 out of 18 publications