Bridging the Semantic Gap Between Research Eligibility Criteria and Clinical Data

Weng, Chunhua

Abstract

Our long-term goal is to optimize the design and conduct of human clinical research using informatics1. Eligibility criteria define the study population for every human study. Their clarity, accuracy and precision are crucial to the success of participant recruitment, results dissemination, and evidence synthesis. Our goal for this renewal is to build a data-driven and knowledge-based decision aid for real-life clinical researchers to optimize research eligibility criteria definition. The difference in the semantic representation of an eligibility criterion (e.g., having Type 2 diabetes mellitus) and its operationalization as a clinical variable (e.g., HbA1C ? 6.5% or ICD-9 code = ?250.00?) has been defined as the semantic gap2, the closing of which is a grand challenge for biomedical informatics2,3. Our research has contributed to the in-depth understanding of this semantic gap and how it limits computational reuse and effective communication of eligibility criteria to key stakeholders of clinical research4-9. We have developed informatics methods to help bridge this gap, by transforming free-text eligibility criteria into semi-structured formats to aid in study cohort identification10-13, analysis of the population representativeness of related clinical trials14-19, text mining of common eligibility features and their trends18,20-24, and identification of questionable exclusion criteria for mental disorder trials25. We used several of these methods to develop a visualization system called VITTA17 that shows how eligibility criteria and the clinical features of clinical trial populations vary across related trials. More importantly, our research has revealed an understudied root cause of the semantic gap, which is that eligibility criteria are often poorly defined, inaccurate, nonspecific, or imprecise, and not easily translatable to the real-world electronic health record (EHR) data representations to which the criteria must be operationalized. The advent of Big Patient Data offers an unprecedented opportunity to draw on the characteristics of real-world patients to guide and inform the data-driven precise definition of eligibility criteria25. By defining the characteristics of the intended study population, eligibility criteria critically influence the population representativeness of a clinical study, which further influences the tradeoff between patient safety and research results? replicability and generalizability. We hypothesize that by integrating patient data, including clinical and genomic data, with public clinical trial information, we can proactively guide investigators to optimize the precision, recruitment feasibility and representativeness of eligibility criteria. This research will demonstrate a novel data-driven and knowledge-based system to assist researchers with optimizing eligibility criteria, through innovative informatics methods for integrating proprietary and public data for deep phenotyping, target population profiling, and quantification and visualization of population representativeness.

Public Health Relevance

This research will increase the transparency of the population representativeness of clinical research eligibility criteria, reduce selection biases, improve research reliability, and enhance the patient-centeredness of clinical studies and thus mitigate health disparities.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM009886-11
Application #: 9983140
Study Section: Special Emphasis Panel (ZLM1)
Program Officer: Sim, Hua-Chuan

Project Start: 2017-09-14
Project End: 2021-08-31
Budget Start: 2020-09-01
Budget End: 2021-08-31
Support Year: 11
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Columbia University (N.Y.)
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 621889815

City: New York
State: NY
Country: United States
Zip Code: 10032

Related projects

Publications

Butler, Alex; Wei, Wei; Yuan, Chi et al. (2018) The Data Gap in the EHR for Clinical Research Eligibility Screening. AMIA Jt Summits Transl Sci Proc 2017:320-329

Ta, Casey N; Dumontier, Michel; Hripcsak, George et al. (2018) Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Sci Data 5:180273

Grossman, Lisa V; Mitchell, Elliot G; Hripcsak, George et al. (2018) A method for harmonization of clinical abbreviation and acronym sense inventories. J Biomed Inform 88:62-69

Son, Jung Hoon; Xie, Gangcai; Yuan, Chi et al. (2018) Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. Am J Hum Genet 103:58-73

Weng, Chunhua; Goldstein, Andrew; Yuan, Chi et al. (2018) The ranking of scientists. J Biomed Inform 79:145-146

He, Zhe; Gonzalez-Izquierdo, Arturo; Denaxas, Spiros et al. (2017) Comparing and Contrasting A Priori and A Posteriori Generalizability Assessment of Clinical Trials on Type 2 Diabetes Mellitus. AMIA Annu Symp Proc 2017:849-858

Luo, Jake; Chen, Weiheng; Wu, Min et al. (2017) Systematic data ingratiation of clinical trial recruitment locations for geographic-based query and visualization. Int J Med Inform 108:85-91

Goldstein, Andrew; Venker, Eric; Weng, Chunhua (2017) Evidence appraisal: a scoping review, conceptual framework, and research agenda. J Am Med Inform Assoc 24:1192-1203

Sen, Anando; Goldstein, Andrew; Chakrabarti, Shreya et al. (2017) The representativeness of eligible patients in type 2 diabetes trials: a case study using GIST 2.0. J Am Med Inform Assoc :

Liu, Sijia; Wang, Liwei; Ihrke, Donna et al. (2017) Correlating Lab Test Results in Clinical Notes with Structured Lab Data: A Case Study in HbA1c and Glucose. AMIA Jt Summits Transl Sci Proc 2017:221-228

Showing the most recent 10 out of 99 publications

Comments

Be the first to comment on Chunhua Weng's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: