Bridging the Semantic Gap Between Research Eligibility Criteria and Clinical Data

Weng, Chunhua

Abstract

Averaging about $1,000 per patient1, recruitment remains an expensive bottleneck for human studies. The rapidly increasing adoption of electronic health records (EHR) has made electronic prescreening (E-screening hereafter) a practicable solution to this bottleneck. Our long-term goal is to achieve this """"""""holy grail"""""""". Our short- term goal of this competing continuation is to develop an intelligent patient query consultant to improve the accuracy and efficiency of E-screening. One of the difficulties for E-screening is the semantic gap between eligibility criteria and clinical data.2 Each eligibility criterion (e.g., hypertension) describes a patient characteristic, which is correlated with multiple data features (e.g., orders of hypertension drugs, elevated blood pressure, and symptoms of hypertension) in EHR. Moreover, each data feature may have multiple semantic representations (e.g., """"""""SBP"""""""", """"""""BP"""""""", or """"""""blood pressure"""""""") from disparate data sources. For example, elevated systolic blood pressure can be recorded in varying formats in an emergency room, a doctor's office, an ICU, and an in-patient unit, but not all of these readings necessarily indicate chronic hypertension. The use of clinical data to identify patients eligible for clinical research requires specialized knowledge and expert guidance to navigate the vast space of data features and intelligent inferences from data features for eligibility determination. A user must understand the characteristics of available data before using them to search for patients. For example, when only 5% of hypertensive patients have ICD-9 codes for hypertension but 73% of these patients have hypertension drug orders, using drug information to construct a query of hypertensive patients will be more effective than one using ICD-9 codes. Even sophisticated biomedical data query tools such as i2b2, VISAGE, and STRIDE only passively translate user-specified data features into a query statement. They do not guide a researcher in selecting a data feature and its most appropriate semantic representations or data sources. Little aid is available to inform researchers about data characteristics or to help them conduct exploratory data analyses for optimal data feature selection. Mixed-initiative interaction,3 which allows human and computer to collaboratively contribute to converged problem solutions, can potentially fulfill this need. We hypothesize that by equipping biomedical researchers with a knowledge-based, mixed-initiative dialog system, we can maximize the efficiency and accuracy of E- screening by supporting exploratory analyses of correlated data features for query optimization. Our approach is innovative because it (1) addresses the user needs for intelligent query interfaces for clinical data, (2) provides a novel data-driven approach to eligibility determination based on correlated data features, and (3) enables efficient query optimization through support of human-computer collaborative problem solving. We will build on the results from our first funding period for bridging the semantic gap.4-21 We developed an analysis pipeline called EliXR to construct a semantic knowledge representation for eligibility criteria 6,9,16,17, which can be used to transform free-text eligibility criteria ito structured narrative.6 We developed methods to dynamically categorize eligibility criteria by data type.8 We accumulated E-screening experience from three NIH-sponsored clinical trials.7,13,21 We developed a method combining PubMed knowledge and EHR data to infer patient phenotype4 and reconciled structured and unstructured clinical data to support E-screening.18 We are prepared with methods and a preliminary understanding of the building blocks necessary to optimally translate eligibility criteria into data features;therefore, our current proposal is the logical next step.
Our specific aims are to: 1. Use mixed methods to understand the needs of biomedical researchers for query clarification and identify common strategies used by query analysts for plan optimization for complex eligibility queries. 2. Develop a knowledge-based, mixed-initiative dialog system to improve human-computer collaboration for query formulation using participatory design methods. 3. Evaluate the efficacy and usability of the mixed-initiative dialog system using a research data warehouse and two use cases: research protocol feasibility testing and trial recruitment prescreening. We will advance the field by contributing knowledge of the needs for query support among biomedical researchers and an effective E-screening method that combines intelligent query recommendation and iterative query by review22 to improve data access for researchers through human-computer collaboration.

Public Health Relevance

This project addresses the needs of clinician scientists to understand data characteristics when using clinical data to search for potentially eligible clinical research participants. It will develop a novel intelligent query consultant to guide clinicians in interrogating clinical databases. This project has the potential to improve the efficiency and accuracy and reduce cost of patient prescreening for clinical and translational research.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM009886-06
Application #: 8695478
Study Section: Special Emphasis Panel (ZLM1)
Program Officer: Sim, Hua-Chuan

Project Start: 2009-04-01
Project End: 2016-07-15
Budget Start: 2014-07-16
Budget End: 2015-07-15
Support Year: 6
Fiscal Year: 2014
Total Cost
Indirect Cost

Institution

Name: Columbia University (N.Y.)
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #

City: New York
State: NY
Country: United States
Zip Code: 10032

Related projects

Publications

Butler, Alex; Wei, Wei; Yuan, Chi et al. (2018) The Data Gap in the EHR for Clinical Research Eligibility Screening. AMIA Jt Summits Transl Sci Proc 2017:320-329

Ta, Casey N; Dumontier, Michel; Hripcsak, George et al. (2018) Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Sci Data 5:180273

Grossman, Lisa V; Mitchell, Elliot G; Hripcsak, George et al. (2018) A method for harmonization of clinical abbreviation and acronym sense inventories. J Biomed Inform 88:62-69

Son, Jung Hoon; Xie, Gangcai; Yuan, Chi et al. (2018) Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. Am J Hum Genet 103:58-73

Weng, Chunhua; Goldstein, Andrew; Yuan, Chi et al. (2018) The ranking of scientists. J Biomed Inform 79:145-146

He, Zhe; Gonzalez-Izquierdo, Arturo; Denaxas, Spiros et al. (2017) Comparing and Contrasting A Priori and A Posteriori Generalizability Assessment of Clinical Trials on Type 2 Diabetes Mellitus. AMIA Annu Symp Proc 2017:849-858

Luo, Jake; Chen, Weiheng; Wu, Min et al. (2017) Systematic data ingratiation of clinical trial recruitment locations for geographic-based query and visualization. Int J Med Inform 108:85-91

Goldstein, Andrew; Venker, Eric; Weng, Chunhua (2017) Evidence appraisal: a scoping review, conceptual framework, and research agenda. J Am Med Inform Assoc 24:1192-1203

Sen, Anando; Goldstein, Andrew; Chakrabarti, Shreya et al. (2017) The representativeness of eligible patients in type 2 diabetes trials: a case study using GIST 2.0. J Am Med Inform Assoc :

Liu, Sijia; Wang, Liwei; Ihrke, Donna et al. (2017) Correlating Lab Test Results in Clinical Notes with Structured Lab Data: A Case Study in HbA1c and Glucose. AMIA Jt Summits Transl Sci Proc 2017:221-228

Showing the most recent 10 out of 99 publications

Comments

Be the first to comment on Chunhua Weng's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: