Statistical Methods for Analyzing Electronic Health Record Data

Chen, Jinbo

Abstract

The overarching goal of this proposal is to develop innovative statistical methods for Electronic Health Record (EHR) based research. Clinically relevant information from the EHR permits the derivation of a rich collection of phenotypes. Unfortunately, since the data is primarily collected for clinical rather than research purposes, the true status of any given individual with respect to the trait of interest is not necessarily known. A common study design is to use structured clinical data elements to identify case and control groups on which subsequent analyses are based. To minimize identification error, a common practice is that separate, but complimentary, rules are developed to select individuals for the case and control groups, with case selection rules emphasizing a high positive predictive value (PPV), and control selection rules emphasizing a high negative predictive value (NPV). The accuracy of control identification is usually high as the sheer number of available controls permits overly restrictive definition constructed to insure a high NPV. In contrast, contamination by subjects who are not cases plagues case selection, as the need to have adequate sample size, and by extension power, must be balanced against the probability that those selected truly have the trait of interest. We call these non-cases ?ineligibles? because they do not satisfy the definition of controls. Ignoring inaccuracy in case identification by treating ineligibles as true cases can lead to biased analysis. No statistical methods yet exist for addressing the bias resultant from this unique challenge of case contamination in EHR-based case-control studies. In particular, statistical methods for the classical misclassification problem where labels for cases and controls are switched are not applicable. The current standard practice limits analysis to a further selected subset with high PPV, which may have practically ignorable bias but not efficient. This proposal aims to fill in this gap by developing efficient statistical methods when ?gold standard? case versus non-case status is available from medical chart review for a validation subset of candidate cases. Our methods, accompanied by comprehensive and user-friendly software, will offer researchers a rich arsenal of statistical methods and tools for analyzing EHR data.

Public Health Relevance

A unique challenge to study binary phenotypes using electronic health records (EHRs) is that candidate cases who are necessarily identified by Boolean rules or probabilistic algorithms are a mixture of true cases and non-cases. We propose a range of innovative statistical methods to address this case contamination problem that is unique to EHR- based case-control studies. Our methods, accompanied by comprehensive and user- friendly software, will provide the necessary statistical methods and tools to facilitate EHR-based research.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Heart, Lung, and Blood Institute (NHLBI)
Type: Research Project (R01)
Project #: 1R01HL138306-01A1
Application #: 9523509
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Mussolino, Michael Eugene

Project Start: 2018-09-01
Project End: 2021-06-30
Budget Start: 2018-09-01
Budget End: 2019-06-30
Support Year: 1
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: University of Pennsylvania
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 042250712

City: Philadelphia
State: PA
Country: United States
Zip Code: 19104

Related projects


NIH 2020 R01 HL	Statistical Methods for Analyzing Electronic Health Record Data Chen, Jinbo / University of Pennsylvania
NIH 2019 R01 HL	Statistical Methods for Analyzing Electronic Health Record Data Chen, Jinbo / University of Pennsylvania
NIH 2018 R01 HL	Statistical Methods for Analyzing Electronic Health Record Data Chen, Jinbo / University of Pennsylvania

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: