Current medical treatment guidelines largely rely on data from randomized controlled trials that study average effects, which may be inadequate for making individualized decisions for real-world patients. Large-scale electronic health records (EHRs) data provide unprecedented opportunities to optimize personalized treatment strategies and generate evidence relevant to real-world patients. However, there are inherent challenges in the use of EHRs, including non-experimental nature of data collection processes, heterogeneous data types with complex dependencies, irregular measurement patterns, multiple dynamic treatment sequences, and the need to balance risk and benefit of treatments. Using two high-quality EHR databases, Columbia University Medical Center's clinical data warehouse and the Indiana Network for Patient Care database, and focusing on type 2 diabetes (T2D), this proposal will develop novel and scalable statistical learning approaches that overcome these challenges to discover optimal personalized treatment strategies for T2D from real-world patients. Specifically, under Aim 1, we will develop a unified framework to learn latent temporal processes for feature extraction and dynamic patient records representation. Our approach will accommodate large-scale variables of mixed types (continuous, binary, counts) measured at irregular intervals. They extract lower-dimensional components to reflect patients' dynamic health status, account for informative healthcare documentation processes, and characterize similarities between patients.
Under Aim 2, we will develop fast and efficient multi-category machine learning methods, in order to evaluate treatment propensities and adaptively learn optimal dynamic treatment regimens (DTRs) among the extensive number of treatment options observed in the EHRs. The methods will provide sequential decisions that determine the best treatment sequence for a T2D patient given his/her EHRs.
Under Aim 3, we will develop statistical learning methods to assist multi-faceted treatment decision-making, which balances risks versus benefits when evaluating a DTR. Our approach will ensure maximizing benefit to the greatest extent while controlling all risk outcomes under the safety margins. For all aims, we will develop efficient stochastic resampling algorithms to scale up the optimization for massive data sizes. We will identify optimal DTRs for T2D using the extracted information from patients' comorbidity conditions, medications, and laboratory tests, as well as records-collection processes. Our methodologies will be applied and cross-validated between the two EHR databases. The treatment strategies learned from the representative EHR databases with a diverse patient population will be beneficial for individual patient care, assisting clinicians to adaptively choose the optimal treatment for a patient. Finally, we will disseminate our methods and results through freely available software and outreach to the informatics and clinical experts at our Centers for Translational Science and elsewhere.

Public Health Relevance

This proposal aims to develop novel and scalable statistical learning methods to analyze electronic health records (EHRs) and use two real-world, high-quality EHR databases for personalized medicine research. The methods will handle the non-experimental nature of data collection processes, along with heterogeneous data types, dynamic treatment sequences, and the trade-off between benefit and risk outcomes. The results will complement the current knowledge base for individual patient care using evidence generated from patients in real-world clinical practices.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM124104-03
Application #
9891071
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
2018-04-01
Project End
2022-03-31
Budget Start
2020-04-01
Budget End
2021-03-31
Support Year
3
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of North Carolina Chapel Hill
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599