Effective information retrieval from the biomedical literature is essential to supporting evidence-based clinical practice. However, research suggests that clinicians have difficulty generating sufficiently specific queries using existing interfaces to electronic information resources. On account of the rapid proliferation of the biomedical research literature, there is a need for the development of tools to enable clinicians and researchers to find and retrieve documents of interest. The vector space model, in which documents are represented as vectors in a high-dimensional space, is well established in information retrieval. However, as this model indexes documents on the basis of terms (or concepts in variants of the model), this limits the specificity with which these documents can be queried. In our recent research, we have developed Predication-based Semantic Indexing (PSI), a vector-based model which encodes knowledge in the form of object-relation-object triplets (or predications) extracted from MEDLINE by the SemRep system, into vector space. In the proposed research we will develop and evaluate a new model of information retrieval based on PSI. This model will enable searching for documents using concepts and relations, in order to answer specific questions such as """"""""what is used to treat Tuberculosis"""""""". This model represents a new direction in information retrieval research, and our hypothesis is that document representations based on predications will enable the specification of queries that are more precise than are possible with existing models. To test this hypothesis, the model will be evaluated using the OHSUMED test set, and compared to the traditional vector space model using standard performance metrics.

Public Health Relevance

This project proposes to develop and evaluate a new model for biomedical information retrieval that encodes semantic knowledge (such as haloperidol TREATS schizophrenia) in a high-dimensional vector space. Unlike traditional vector space models for information retrieval, in which queries are based on keywords, queries will occur on the basis of concepts and their relations. This will allow for searches that are more precise than were previously possible (such as for documents containing an answer to the question what treats schizophrenia?), adding a new dimension of knowledge to the vector space model of information retrieval.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21LM010826-01
Application #
7977263
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
2010-09-30
Project End
2012-09-29
Budget Start
2010-09-30
Budget End
2011-09-29
Support Year
1
Fiscal Year
2010
Total Cost
$221,500
Indirect Cost
Name
University of Texas Health Science Center Houston
Department
Type
Schools of Allied Health Profes
DUNS #
800771594
City
Houston
State
TX
Country
United States
Zip Code
77225
Malec, Scott A; Wei, Peng; Xu, Hua et al. (2016) Literature-Based Discovery of Confounding in Observational Clinical Data. AMIA Annu Symp Proc 2016:1920-1929
Widdows, Dominic; Cohen, Trevor (2015) Reasoning with Vectors: A Continuous Model for Fast Robust Inference. Log J IGPL 23:141-173
Moon, Sungrim; Berster, Bjoern-Toby; Xu, Hua et al. (2013) Word Sense Disambiguation of clinical abbreviations with hyperdimensional computing. AMIA Annu Symp Proc 2013:1007-16
Berster, Bjoern-Toby; Goodwin, J Caleb; Cohen, Trevor (2012) Hyperdimensional computing approach to word sense disambiguation. AMIA Annu Symp Proc 2012:1129-38
Wahle, Manuel; Widdows, Dominic; Herskovic, Jorge R et al. (2012) Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts. AMIA Annu Symp Proc 2012:940-9
Cohen, Trevor; Widdows, Dominic; Schvaneveldt, Roger W et al. (2012) Discovering discovery patterns with Predication-based Semantic Indexing. J Biomed Inform 45:1049-65