Our long-term objective is to enlarge the scope and efficiency of clinical research through enhanced use of clinical data to support clinical research decisions. This proposal aims to improve the use of electronic health records (EHR) to automate clinical trials eligibility screening by developing a new semantic alignment framework. Clinical trials research is an important step for translating breakthroughs in basic biomedical sciences into knowledge that will benefit clinical practice and human health. However, a significant obstacle is identifying eligible participants. Eighty-six percent of all clinical trials are delayed in patient recruitment for from one to six months and 13% are delayed by more than six months. Enrollment delay is expensive. In a recent large, multi-center trial, about 86.8 staff hours and more than $1000 was spent to enroll each participant. Ineffective enrollment also produces a big social cost in that up to 60% of patients can miss being identified. The broad deployment of EHR systems has created unprecedented opportunities to solve the problem because EHR systems contain a rich source of information about potential participants. However, it is often a knowledge-intensive, time-consuming, and inefficient manual procedure to match eligibility criteria such as """"""""renal in- sufficiency"""""""" to clinical data such as """"""""serum creatinine = 1.0 mg/dl for an 80-year old white female patient."""""""" This enduring challenge is partly caused by the disconnection between abstract and ambiguous eligibility criteria and highly specific clinical data manifestations;we call this a semantic gap. Despite earlier work on computer-based clinical guidelines and protocols, limited effort has been devoted to support automatic matching between concepts and their manifestations in patient phenotypes such as signs and symptoms. We hypothesize that we can characterize the semantic gap and design a knowledge-based, natural-language processing assisted semantic alignment framework to bridge the semantic gap. Therefore, our specific aims are: (1) to investigate the semantic gap between clinical trials eligibility criteria and clinical data;(2) to design a concept-based, computable knowledge representation for eligibility criteria;(3) to design a semantic alignment framework linking an eligibility criteria knowledge base and a clinical data warehouse to generate semantic queries for eligibility identification;and (4) to evaluate the utility of the semantic alignment framework. This research is novel and unique in that (1) there are no prior studies about the semantic gap between eligibility criteria and clinical data;and (2) for the first time, we design a semantic alignment framework to automatically match eligibility criteria to clinical data. The research team comprising expertise from the Department of Biomedical Informatics at Columbia University and the Division of General Medicine from UCSF are uniquely positioned to carry out this research, given the experience of the team (medical knowledge representation, natural language processing, controlled clinical terminology, ontology-based semantic reasoning, data mining, statistics, health data organization, semantic harmonization, and clinical trials), the availability of a repository of 13 years of data on 2 million patients, and the availability of a natural language processor called MedLEE to convert millions of narrative reports into richly coded clinical data.

Public Health Relevance

This research has the potential to improve process efficiency and accuracy, as well as to reduce cost and required human skills for clinical trials eligibility screening. The ultimate goal is to accelerate scientific discovery of more effective treatments for illness.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
3R01LM009886-02S1
Application #
8056227
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
2010-04-15
Project End
2011-04-14
Budget Start
2010-04-15
Budget End
2011-04-14
Support Year
2
Fiscal Year
2010
Total Cost
$177,422
Indirect Cost
Name
Columbia University (N.Y.)
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
621889815
City
New York
State
NY
Country
United States
Zip Code
10032
Butler, Alex; Wei, Wei; Yuan, Chi et al. (2018) The Data Gap in the EHR for Clinical Research Eligibility Screening. AMIA Jt Summits Transl Sci Proc 2017:320-329
Ta, Casey N; Dumontier, Michel; Hripcsak, George et al. (2018) Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Sci Data 5:180273
Grossman, Lisa V; Mitchell, Elliot G; Hripcsak, George et al. (2018) A method for harmonization of clinical abbreviation and acronym sense inventories. J Biomed Inform 88:62-69
Son, Jung Hoon; Xie, Gangcai; Yuan, Chi et al. (2018) Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. Am J Hum Genet 103:58-73
Weng, Chunhua; Goldstein, Andrew; Yuan, Chi et al. (2018) The ranking of scientists. J Biomed Inform 79:145-146
Si, Yuqi; Weng, Chunhua (2017) An OMOP CDM-Based Relational Database of Clinical Research Eligibility Criteria. Stud Health Technol Inform 245:950-954
Sen, Anando; Ryan, Patrick B; Goldstein, Andrew et al. (2017) Correlating eligibility criteria generalizability and adverse events using Big Data for patients and clinical trials. Ann N Y Acad Sci 1387:34-43
He, Zhe; Langford, Aisha (2017) Comparative Analysis of Geriatric and Adult Drug Clinical Trials on ClinicalTrials.gov. Stud Health Technol Inform 245:1265
Kang, Tian; Zhang, Shaodian; Tang, Youlan et al. (2017) EliIE: An open-source information extraction system for clinical trial eligibility criteria. J Am Med Inform Assoc 24:1062-1071
He, Zhe; Gonzalez-Izquierdo, Arturo; Denaxas, Spiros et al. (2017) Comparing and Contrasting A Priori and A Posteriori Generalizability Assessment of Clinical Trials on Type 2 Diabetes Mellitus. AMIA Annu Symp Proc 2017:849-858

Showing the most recent 10 out of 99 publications