Adjusting for selection bias due to missing data in electronic health records-based research

Thaweethai, Tanayott

Abstract

The adoption of electronic health records (EHR) in routine healthcare has resulted in a hugely promising source of data for public health and medical research. Because EHR include rich data on large populations at relatively low cost, many researchers have turned to observational studies using EHR as an alternative to conducting randomized studies that are often prohibitively expensive and time-consuming to perform. However, data are not collected for research purposes, and the potential for selection bias is high when analyses are restricted to patients with complete data. Standard methods to adjust for selection bias due to missing data, such as inverse probability weighting (IPW) and multiple imputation (MI), fail to address the complex nature of EHR data. Speci?cally, these methods tend to oversimplify the interplay of numerous decisions by patients, physicians, and insurers that collectively determine whether complete data is observed. One method for addressing selection bias due to missing data involves breaking down the complex process that governs whether or not a patient has complete data into a series of more manageable sub-mechanisms. This method involves characterizing the data provenance, or the process by which data appears in EHR. Statistical models can then be built for selection at each sub-mechanism to better re?ect the true data provenance. A frame- work for estimation has been developed in which IPW is used to adjust for selection at every sub-mechanism. Since MI is generally more ef?cient than IPW, strategies for 'blended analyses' will be developed that simulta- neously implement IPW and MI under the modularized speci?cation. Estimation and inferential procedures under this framework will be established, and extensions to Rubin's rules for the variance of estimators that combine results across multiply imputed datasets in this framework will be derived. IPW and MI fail to produce consistent estimates when data is missing not at random (MNAR); that is, when the probability that some covariate or outcome is measured depends on the value of the covariate itself, or other factors that are not completely measured in the EHR. Methods for sensitivity analyses will be developed to assess the extent to which estimators yielded by these methods are impacted by such unobserved data. The methods described in these aims will be applied to EHR-derived data that include long-term health out- comes among 13,000 individuals with type 2 diabetes who underwent bariatric surgery between 1997 and 2013. Speci?cally, this research will answer open questions about the ef?cacy and safety of bariatric surgery in the treatment of patients with obesity and type 2 diabetes, and will consider how rates of micro- and macrovascu- lar complications associated with diabetes differ between patients undergoing alternative surgical procedures. Robust software will be developed that provides researchers valid, practical, and user-friendly tools for the the identi?cation, characterization, and control of selection bias in EHR-based research.

Public Health Relevance

Electronic health records (EHR) include rich data on large populations over long periods of time and are available at relatively low cost, but data in EHR are not collected for research purposes. Missing data is extremely common in EHR and analyses that exclude patients on the basis of incomplete data are subject to selection bias. The focus of this proposal is the development of statistical methods to adjust for selection bias due to missing data in EHR-based research.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Type: Predoctoral Individual National Research Service Award (F31)
Project #: 5F31DK118817-02
Application #: 9742288
Study Section: Special Emphasis Panel (ZDK1)
Program Officer: Castle, Arthur

Project Start: 2018-08-01
Project End: 2020-10-15
Budget Start: 2019-10-16
Budget End: 2020-10-15
Support Year: 2
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Harvard University
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 149617367

City: Boston
State: MA
Country: United States
Zip Code: 02115

Related projects


NIH 2020 F31 DK	Adjusting for selection bias due to missing data in electronic health records-based research Thaweethai, Tanayott / Harvard University
NIH 2018 F31 DK	Adjusting for selection bias due to missing data in electronic health records-based research Thaweethai, Tanayott / Harvard University

Comments

Be the first to comment on Tanayott Thaweethai's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: