Over the last decade, the online search for biological information has progressed rapidly and has become an integral part of any scientific discovery process. Today, it is virtually impossible to conduct R&D in biomedicine without relying on the kind of Web resources developed and maintained by the NCBI. Indeed, each day millions of users search for biological information via NCBIs online Entrez system. However, finding data relevant to a users information need is not always easy in Entrez. Improving our understanding of the growing population of Entrez users, their information needs and the way in which they meet these needs opens opportunities to improve information services and information access provided by NCBI. Among all Entrez databases, PubMed is the most used and often serves as an entry point for people to access related data in other databases.One resource for understanding and characterizing patrons of PubMed search engines is its transaction logs. Our previous investigation of PubMed search logs has led us to develop and deploy several useful applications in assisting user searches and retrieval such as the query formulation in PubMed, namely Related Queries, Query Autocomplete and Author Name Disambiguation. Inspired by past success, we have continued using log analysis to improve access to NCBI resources. For example, we have used user clicks to identify articles that the user considered relevant to their own query. In 2016-2017, we have used deep learning models to understand the relationship between the query and the content of potentially relevant articles. This approach is robust and outperforms both traditional IR algorithms as well as related shallow and deep models based on continuous representations of text, with better results on the under-specified query and term mismatch problems. Of course, there are multiple factors that indicate whether an article is relevant to the searcher. These include the connection between the query and the content, how recent the article is, whether other people found the article relevant, etc. PubMeds new Best Match sort order (using a Learning to Rank algorithm) combines a number of different scores and sources of information to identify the most relevant queries. This has significantly improved the results of our relevance rankings since Spring 2017. We are continuing the effort begun by our work on TermVariants. When a term is used in a query, usually documents using equivalent terms are also desired. A seeming trivial example is singular and plural terms. But care must be taken to avoid irrelevant articles. For example, navely applying plural rules to abbreviations is often not helpful. Guidelines are being developed to show where these expansions will be helpful. To better understand queries, we developed a Field Sensor to completely identify the portions and aims of a query. In other words, we identify which part of the query is an author name, a journal title, a date, or key phrases describing a knowledge the searcher would like to uncover. One practical use for this tool is reminding those looking for information, not specific articles, about our improved relevance searching. We continue to improve our handling and understanding of author names in PubMed articles. Principle Investigators on NIH-funded grants make a particularly important subset of PubMed authors. Additional information about these authors is available from their grants. Information about published papers in grants allows us to do a better job connecting papers and authors. These authors can be more reliably identified between different institutional affiliations, across changes in research focus and even connect different names for the same author.

Project Start
Project End
Budget Start
Budget End
Support Year
7
Fiscal Year
2017
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
Zip Code
Yeganova, Lana; Kim, Won; Comeau, Donald C et al. (2018) A Field Sensor: computing the composition and intent of PubMed queries. Database (Oxford) 2018:
Fiorini, Nicolas; Canese, Kathi; Starchenko, Grisha et al. (2018) Best Match: New relevance search for PubMed. PLoS Biol 16:e2005343
NCBI Resource Coordinators (2018) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 46:D8-D13
Kim, Sun; Yeganova, Lana; Comeau, Donald C et al. (2018) PubMed Phrases, an open set of coherent phrases for searching biomedical literature. Sci Data 5:180104
Fiorini, Nicolas; Lipman, David J; Lu, Zhiyong (2017) Towards PubMed 2.0. Elife 6:
Kim, Sun; Fiorini, Nicolas; Wilbur, W John et al. (2017) Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. J Biomed Inform 75:122-127
NCBI Resource Coordinators (2017) Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res 45:D12-D17
NCBI Resource Coordinators (2016) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44:D7-19
Huang, Chung-Chi; Lu, Zhiyong (2016) Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation. Database (Oxford) 2016:
NCBI Resource Coordinators (2015) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 43:D6-17

Showing the most recent 10 out of 18 publications