The MEDLINE database (used by the service at www.ncbi.nlm.nih.gov/pubmed/) has more than 18 million publication entries, with a total size of almost 75GB when stored as XML files. PubMed handles over 2 million searches daily and is critical to the work of a wide range of biomedical researchers. Making PubMed searches faster, more comprehensive, and more intuitive will have a broad positive impact on a wide array of researchers, clinicians and students. This proposal will investigate new techniques for improving search on PubMed. We have three specific aims.
Our first aim i s to develop techniques for search-as-you-type data exploration. We will study how to make the system continuously display search results as the user types keywords or modifies the query. This will help users explore information and see results """"""""on the fly."""""""" Our second aim is to develop techniques for fuzzy search. Users are often frustrated by failing to find relevant publications due to typographical errors or their limited knowledge about the correct spelling of entities such as gene or author names. We will study how to solve this problem with fuzzy search. For example, if a user issues the query """"""""apoptoses dexamethason"""""""", our techniques will find publications that include similar keywords """"""""apoptosis dexamethasone"""""""".
Our third aim i s to study semantics-related challenges in type-ahead search. We will study how to support more powerful search using synonymous terms, support on-the-fly query suggestions, support form-based """"""""advanced"""""""" type-ahead search, and improve search using query logs. We have already developed a system (http://ipubmed.ics.uci.edu) to demonstrate the value of our proposed techniques. It supports type-ahead, fuzzy search on the entire MEDLINE collection (as of September 2009) of more than 18 million publication records. This prototype has allowed us to obtain feedback from biomedical researchers in a variety of specialties at multiple institutions to ensure our improvements are useful to many groups of users.

Public Health Relevance

Li, C Narrative In this project we study how to improve PubMed by supporting type-ahead, fuzzy search. Making PubMed searches faster, more comprehensive, and more intuitive will have a broad positive impact on a wide array of researchers, clinicians and students.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21LM010143-01A1
Application #
7991751
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
2010-09-30
Project End
2012-09-29
Budget Start
2010-09-30
Budget End
2011-09-29
Support Year
1
Fiscal Year
2010
Total Cost
$232,490
Indirect Cost
Name
University of California Irvine
Department
Biostatistics & Other Math Sci
Type
Other Domestic Higher Education
DUNS #
046705849
City
Irvine
State
CA
Country
United States
Zip Code
92697