The Veterans Health Information Systems and Technology Architecture (VistA) is an integrated system of software applications that directly supports patient care at Veterans Health Administration (VHA) healthcare facilities. To facilitate veteran care, VistA maintains a massive repository of patient-related data, including over 1.3 billion textual documents (e.g., progress notes, discharge summaries). The Computerized Patient Record System (CPRS), a front-end application that interfaces with the VistA data repository, allows clinicians to enter, review, and update information concerning all aspects of a veteran's care in their electronic health record (EHR). For veterans with complex and chronic diseases, thousands or tens of thousands of text- based progress notes may be associated with their EHR. Searching through this vast amount of textual data to find useful information can be an arduous task due to the lack of sophisticated search capabilities within CPRS. The VistA EHR system represents the cornerstone of clinical care in the VA. This pilot study is the first step in a program of research, where the ultimate goal is to make finding relevant information within a veteran's EHR easier for clinicians, thus improving processes of care and, potentially, patient outcomes. The purpose of the proposed study is to determine if information retrieval (IR) techniques found to be useful in searching large text-based data repositories such as the Internet or PubMed can be applied to progress notes from VistA. In addition, we will explore whether including information about clinically-relevant concepts from a medical ontology improves IR results. A total of four IR systems will be examined: (1) vector space model (baseline);(2) vector space model enhanced with ontology weights;(3) latent semantic indexing model;and (4) latent semantic indexing model enhanced with ontology weights. The SNOMED-CT ontology will be used with concepts weighted via their relative importance within the ontology by Google's PageRank algorithm. The four IR systems will be evaluated based on their ability to find progress notes relevant to a selected note;where relevance will be judged by the clinical co- investigators. The document collection to be searched will consist of all progress notes over a 17-month period from a random sample of 20 patients from the James A. Haley Veterans Medical Center (JAHVMC) who tested positive for methicillin-resistant Staphylococcus aureus (MRSA) and five who did not test positive. The association of MRSA infections with prolonged hospital stays and patients with chronic conditions presents a cohort of patients that are ideal for testing IR systems. The EHR of MRSA-positive patients are likely to contain large numbers of progress notes of a heterogeneous nature (e.g., physician notes, nursing notes, laboratory results). The large quantity and diverse types of notes associated with this complex condition will provide for an excellent test of the effectiveness of the proposed IR techniques. The IR systems will be evaluated using measures derived from precision and recall. The exact Wilcoxon Signed Rank test, a non-parameteric test, will be used to examine all-pair combinations of IR systems for each performance measure.

Public Health Relevance

Information overload has been cited as a major concern of clinicians using the Veterans Health Administration's (VHA) electronic health record (EHR) system. In particular, clinicians have raised concerns over the number, length, and difficulty in finding information within progress notes. This is not surprising since the most current version of the VHA's Computerized Patient Record System (CPRS) only offers a simple exact text matching information retrieval (IR) system, which typically returns none or far too many progress notes. This problem is especially noticeable in Veterans with complex conditions (e.g., MRSA) due to the thousands of notes associated with their EHR from which information could be obtained. This pilot study seeks to develop new IR systems, based on state of the art statistical techniques that could drastically improve search capabilities, reduce information overload, and improve patient care.

Agency
National Institute of Health (NIH)
Institute
Veterans Affairs (VA)
Type
Non-HHS Research Projects (I01)
Project #
1I01HX000530-01A1
Application #
8201503
Study Section
Blank (HSR7)
Project Start
2011-11-01
Project End
2012-10-31
Budget Start
2011-11-01
Budget End
2012-10-31
Support Year
1
Fiscal Year
2013
Total Cost
Indirect Cost
Name
James A. Haley VA Medical Center
Department
Type
DUNS #
929194256
City
Tampa
State
FL
Country
United States
Zip Code
33612