Query Log Analysis for Improving User Access to NCBI Web Services

Lu, Zhiyong

Abstract

Over the last decade, the online search for biological information has progressed rapidly and has become an integral part of any scientific discovery process. Today, it is virtually impossible to conduct R&D in biomedicine without relying on the kind of Web resources developed and maintained by the NCBI. Indeed, each day millions of users search for biological information via NCBIs online Entrez system. However, finding data relevant to a users information need is not always easy in Entrez. Improving our understanding of the growing population of Entrez users, their information needs and the way in which they meet these needs opens opportunities to improve information services and information access provided by NCBI. Among all Entrez databases, PubMed is the most used one and often serves as an entry point for people to access related data in other Entrez databases. Tools to aid searching PubMed are query suggestion, expansion, and spelling correction. Dedicated best match algorithms aid navigational queries by ignoring minor errors and aid informational searches using machine learning to combine relevant signals such as article popularity, publication date and type, and query-document relevance score. Additional valuable aids including identifying related articles and author name disambiguation. PubMed Labs provides a place to trial and improve new search features. It features a clean and mobile-friendly design tailored specifically towards small screen devices and a platform for users to provide feedback guiding future work. While search has usually focused on full documents or references, the value of sentence search is rising. It can identify specific statements rather than whole articles on a general topic. Our new tool, LitSense, provides sentence level search, making sense of biomedical literature at sentence level. A specific use of sentence similarity is to aid the curation efforts in the Conserved Domain Database (CDD). To this end, LitSense has been used to both finds sentences in PubMed articles already used to create CDD summaries and identify new sentences closely related to existing CDD summaries. For using sentences in Deep Learning tasks, BioSentVec is the first sentence encoder specifically for the biomedical domain. It better captures biomedical semantics than general domain encoders. Of course, word embeddings remain the primary method of using Deep Learning in NLP tasks. BioWordVec uses subword information and MeSH to generate biomedical word embeddings that can significantly improve performance. Biomedical terminology often includes important subword information. The semantic information available in ontologies such as MeSH is meaningful. A generic word embedding cannot take advantage of this valuable supplemental information. These machine learning methods benefit from having a large amount of text available. To that end a Web API serves BioC versions of the PMC Open Access Subset and Author Manuscripts. This is a continuously updated complement to our existing FTP service. The documents are available in either JSON or XML and both ASCII and Unicode encodings are available.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIALM000001-09
Application #: 10007518
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 9
Fiscal Year: 2019
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects


NIH 2019 ZIA LM	Query Log Analysis for Improving User Access to NCBI Web Services Lu, Zhiyong / National Library of Medicine
NIH 2018 ZIA LM	Query Log Analysis for Improving User Access to NCBI Web Services Lu, Zhiyong / National Library of Medicine
NIH 2017 ZIA LM	Query Log Analysis for Improving User Access to NCBI Web Services Lu, Zhiyong / National Library of Medicine
NIH 2016 ZIA LM	Query Log Analysis for Improving User Access to NCBI Web Services Lu, Zhiyong / National Library of Medicine
NIH 2015 ZIA LM	Query Log Analysis for Improving User Access to NCBI Web Services Lu, Zhiyong / National Library of Medicine
NIH 2014 ZIA LM	Query Log Analysis for Improving User Access to NCBI Web Services Lu, Zhiyong / National Library of Medicine
NIH 2013 ZIA LM	Query Log Analysis for Improving User Access to NCBI Web Services Lu, Zhiyong / National Library of Medicine	$205,463
NIH 2012 ZIA LM	Query Log Analysis for Improving User Access to NCBI Web Services Lu, Zhiyong / National Library of Medicine	$260,305
NIH 2011 ZIA LM	Query Log Analysis for Improving User Access to NCBI Web Services Lu, Zhiyong / National Library of Medicine	$499,678

Publications

Yeganova, Lana; Kim, Won; Comeau, Donald C et al. (2018) A Field Sensor: computing the composition and intent of PubMed queries. Database (Oxford) 2018:

Fiorini, Nicolas; Canese, Kathi; Starchenko, Grisha et al. (2018) Best Match: New relevance search for PubMed. PLoS Biol 16:e2005343

NCBI Resource Coordinators (2018) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 46:D8-D13

Kim, Sun; Yeganova, Lana; Comeau, Donald C et al. (2018) PubMed Phrases, an open set of coherent phrases for searching biomedical literature. Sci Data 5:180104

Fiorini, Nicolas; Lipman, David J; Lu, Zhiyong (2017) Towards PubMed 2.0. Elife 6:

Kim, Sun; Fiorini, Nicolas; Wilbur, W John et al. (2017) Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. J Biomed Inform 75:122-127

NCBI Resource Coordinators (2017) Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res 45:D12-D17

NCBI Resource Coordinators (2016) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44:D7-19

Huang, Chung-Chi; Lu, Zhiyong (2016) Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation. Database (Oxford) 2016:

NCBI Resource Coordinators (2015) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 43:D6-17

Showing the most recent 10 out of 18 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: