Current medical treatment guidelines largely rely on data from randomized controlled trials that study av- erage effects, which may be inadequate for making individualized decisions for real-world patients. Large-scale electronic health records (EHRs) data provide unprecedented opportunities to optimize personalized treatment strategies and generate evidence relevant to real-world patients. However, there are inherent challenges in the use of EHRs, including non-experimental nature of data collection processes, heterogeneous data types with complex dependencies, irregular measurement patterns, multiple dynamic treatment sequences, and the need to balance risk and bene?t of treatments. Using two high-quality EHR databases, Columbia University Medical Center's clinical data warehouse and the Indiana Network for Patient Care database, and focusing on type 2 diabetes (T2D), this proposal will develop novel and scalable statistical learning approaches that overcome these challenges to discover optimal personalized treatment strategies for T2D from real-world patients. Speci?cally, under Aim 1, we will develop a uni?ed framework to learn latent temporal processes for feature extraction and dynamic patient records representation. Our approach will accommodate large-scale variables of mixed types (continuous, binary, counts) measured at irregular intervals. They extract lower-dimensional components to re?ect patients' dynamic health status, account for informative healthcare documentation processes, and characterize similarities between patients.
Under Aim 2, we will develop fast and ef?cient multi-category machine learning methods, in order to evaluate treatment propensities and adaptively learn optimal dynamic treatment regimens (DTRs) among the extensive number of treatment options observed in the EHRs. The methods will provide se- quential decisions that determine the best treatment sequence for a T2D patient given his/her EHRs.
Under Aim 3, we will develop statistical learning methods to assist multi-faceted treatment decision-making, which balances risks versus bene?ts when evaluating a DTR. Our approach will ensure maximizing bene?t to the greatest extent while controlling all risk outcomes under the safety margins. For all aims, we will develop ef?cient stochastic resampling algorithms to scale up the optimization for massive data sizes. We will identify optimal DTRs for T2D using the extracted information from patients' comorbidity conditions, medications, and laboratory tests, as well as records-collection processes. Our methodologies will be applied and cross-validated between the two EHR databases. The treatment strategies learned from the representative EHR databases with a diverse patient pop- ulation will be bene?cial for individual patient care, assisting clinicians to adaptively choose the optimal treatment for a patient. Finally, we will disseminate our methods and results through freely available software and outreach to the informatics and clinical experts at our Centers for Translational Science and elsewhere.

Public Health Relevance

This proposal aims to develop novel and scalable statistical learning methods to analyze electronic health records (EHRs) and use two real-world, high-quality EHR databases for personalized medicine research. The methods will handle the non-experimental nature of data collection processes, along with heterogeneous data types, dynamic treatment sequences, and the trade-off between bene?t and risk outcomes. The results will complement the current knowledge base for individual patient care using evidence generated from patients in real-world clinical practices.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of North Carolina Chapel Hill
Biostatistics & Other Math Sci
Schools of Public Health
Chapel Hill
United States
Zip Code