Title: Improving comparative effectiveness research through electronic health records continuity cohorts PI: Joshua Lin, MD, MPH Abstract (about 30 lines) Epidemiologic analyses of health care data can provide critical evidence on the effectiveness and safety of therapeutics in the routine care setting since clinical trials often exclude frail and older patients who are the primary consumers of most medications. Electronic health record (EHR) databases contain rich clinical information vital for many comparative effectiveness studies and have been increasingly used for drug research. There are currently more than 50 EHR-based research networks in the US. It is thus critical to understand how we can conduct valid comparative clinical studies with EHR data. However, other than few highly integrated plans, most US EHR systems do not have comprehensive capture of medical encounters across the care continuum and may miss substantial amounts of information. Exposures, co-morbidities, and health outcomes that are recorded at a clinic or hospital outside of a given EHR system are invisible to the investigator, increasing misclassification or complete omission of essential variables. While such issues are pervasive, no prior study has ever quantified the magnitude of resultant bias and how to remedy the situation if linkage of more information is not feasible. To address this knowledge gap, we have combined longitudinal claims data from Medicare with EHR patient data from a large multi-center health care system as a `gold standard' setup where the claims data comprehensively capture medical information across care settings and provider systems and EHR provides necessary clinical data. We will (1) use these `gold standard' data to identify `EHR continuity cohorts' for whom the EHR system captures a high proportion of all encounters and evaluate whether misclassification/omission of a list of essential variables in the comparative effectiveness research is substantially reduced within vs outside of the EHR continuity cohort; (2) develop strategies to identify the EHR continuity cohort based on a set of proxy indicators available in typical EHR databases and validate the candidate prediction rules internally in a sample within the given EHR and externally using a second EHR system that is also linked to Medicare claims data; (3) assess research validity and generalizability in the EHR continuity cohorts in several empirical studies; and (4) Develop structured recommendation on how to conduct comparative effectiveness research using high-validity EHR continuity cohorts in an EHR system without linked claims data and make our program public available to facilitate future research using EHR-based research networks.

Public Health Relevance

(2-3 sentences) Large electronic health records databases are commonly used to assess the safety and effectiveness of drugs, but missing data outside of a single EHR system has been a pervasive source of bias in such studies in the US. Our research will produce generalizable algorithms to identify high-validity continuity cohorts in a given EHR system that will allow researchers to leverage the rich clinical data in absence of linked claims data. This approach will systematically improve research validity based on data from the prevailing EHR-based research networks.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brigham and Women's Hospital
United States
Zip Code
Lin, Kueiyu Joshua; Glynn, Robert J; Singer, Daniel E et al. (2018) Out-of-system Care and Recording of Patient Characteristics Critical for Comparative Effectiveness Research. Epidemiology 29:356-363
Lin, Kueiyu Joshua; Singer, Daniel E; Glynn, Robert J et al. (2018) Identifying Patients With High Data Completeness to Improve Validity of Comparative Effectiveness Research in Electronic Health Records Data. Clin Pharmacol Ther 103:899-905