Data from Electronic Health Records (EHR) are a valuable research tool, providing information on outcomes and exposures that would be costly and difficult to obtain through primary data collection. However, EHR data capture is driven by clinical and administrative rather than research needs, necessitating substantial methodological innovation to obtain valid results. While a number of prior methodological studies have focused on reducing confounding in observational studies conducted using EHR data, they have not considered the risk of residual confounding that results when confounder variables are measured with error. The proposed study will develop novel statistical tools tailored to the EHR context to address measurement error and missing data in confounders.
Under Aim 1 we will use a recently developed statistical approach, integrated likelihood, to develop a method for confounder control using imperfect confounders that does not require validation data.
Under Aim 2, we will develop an index of sensitivity of study results to the assumption of ?informative presence,? i.e. that absence of information on a confounder is indicative of absence of the confounder. Novel methods will be evaluated and compared to standard approaches using simulated data and applied to existing data from a study of colon cancer recurrence. Statistical software code for these methods will be developed in the R programming language and disseminated via our project website and Github. This research will provide methodological tools to improve the validity of results obtained through secondary analysis of EHR-derived data. !

Public Health Relevance

Electronic health records (EHR) data are valuable for conducting research on health outcomes. However, measurement error and missingness in confounders derived from the EHR can result in biased estimates of associations of interest. This project will develop novel statistical methods to minimize bias and assess sensitivity of results to violations of assumptions about missing data patterns in order to improve the validity of research using EHR data. !

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21CA227613-02
Application #
9840459
Study Section
Cancer, Heart, and Sleep Epidemiology A Study Section (CHSA)
Program Officer
Yu, Mandi
Project Start
2019-01-01
Project End
2020-12-31
Budget Start
2020-01-01
Budget End
2020-12-31
Support Year
2
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104