Effective information retrieval from the biomedical literature is essential to supporting evidence-based clinical practice. However, research suggests that clinicians have difficulty generating sufficiently specific queries using existing interfaces to electronic information resources. On account of the rapid proliferation of the biomedical research literature, there is a need for the development of tools to enable clinicians and researchers to find and retrieve documents of interest. The vector space model, in which documents are represented as vectors in a high-dimensional space, is well established in information retrieval. However, as this model indexes documents on the basis of terms (or concepts in variants of the model), this limits the specificity with which these documents can be queried. In our recent research, we have developed Predication-based Semantic Indexing (PSI), a vector-based model which encodes knowledge in the form of object-relation-object triplets (or predications) extracted from MEDLINE by the SemRep system, into vector space. In the proposed research we will develop and evaluate a new model of information retrieval based on PSI. This model will enable searching for documents using concepts and relations, in order to answer specific questions such as """"""""what is used to treat Tuberculosis"""""""". This model represents a new direction in information retrieval research, and our hypothesis is that document representations based on predications will enable the specification of queries that are more precise than are possible with existing models. To test this hypothesis, the model will be evaluated using the OHSUMED test set, and compared to the traditional vector space model using standard performance metrics.

Public Health Relevance

This project proposes to develop and evaluate a new model for biomedical information retrieval that encodes semantic knowledge (such as haloperidol TREATS schizophrenia) in a high-dimensional vector space. Unlike traditional vector space models for information retrieval, in which queries are based on keywords, queries will occur on the basis of concepts and their relations. This will allow for searches that are more precise than were previously possible (such as for documents containing an answer to the question what treats schizophrenia?), adding a new dimension of knowledge to the vector space model of information retrieval.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Texas Health Science Center Houston
Schools of Allied Health Profes
United States
Zip Code
Malec, Scott A; Wei, Peng; Xu, Hua et al. (2016) Literature-Based Discovery of Confounding in Observational Clinical Data. AMIA Annu Symp Proc 2016:1920-1929
Widdows, Dominic; Cohen, Trevor (2015) Reasoning with Vectors: A Continuous Model for Fast Robust Inference. Log J IGPL 23:141-173
Moon, Sungrim; Berster, Bjoern-Toby; Xu, Hua et al. (2013) Word Sense Disambiguation of clinical abbreviations with hyperdimensional computing. AMIA Annu Symp Proc 2013:1007-16
Berster, Bjoern-Toby; Goodwin, J Caleb; Cohen, Trevor (2012) Hyperdimensional computing approach to word sense disambiguation. AMIA Annu Symp Proc 2012:1129-38
Wahle, Manuel; Widdows, Dominic; Herskovic, Jorge R et al. (2012) Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts. AMIA Annu Symp Proc 2012:940-9
Cohen, Trevor; Widdows, Dominic; Schvaneveldt, Roger W et al. (2012) Discovering discovery patterns with Predication-based Semantic Indexing. J Biomed Inform 45:1049-65