Diabetic patients are at risk of developing diabetic heart disease, which may lead to complications in care. Diabetic heart disease patients not only have exceptionally high healthcare expenditures and resource utilization but also are likely to have poor patient outcomes. Studies have shown that early intervention of patients likely to develop diabetic heart disease is cost-effective and yields favorable health outcomes. Therefore, early identi?cation of diabetic patients at high-risk of developing diabetic heart disease is crucial to provide effective interventions. The commonly accepted methodology for diabetic heart disease risk prediction is the use of one or more risk scoring systems. However, these risk functions may not generalize well for the diabetes patient and may suffer from poor calibration when used on different cohorts. Moreover, the scoring systems have only been studied on coronary heart disease, one variant of diabetic heart disease while heart failure and diabetic cardiomyopathy remain important, yet insuf?ciently studied problems. Machine learning offers the ability to perform accurate predictive analytics and has been proposed as a way to identify and manage high-risk patients. The primary goal of this proposal is to develop a high-impact and practical risk prediction model that can be used to per- form early identi?cation of high-risk diabetic heart disease patients. Given the heterogeneity and complexity of patient information in electronic health records, the model needs to capitalize on the multi-dimensional temporal nature of pa- tient records to extract identifying characteristics of patients that will develop diabetic heart disease. To accomplish this, we will leverage modern machine learning approaches such as tensor factorization and natural language processing to model complex patient characteristics, provide a more complete representation of the patient, and uncover excellent predictors of diabetic heart disease risk. An existing dataset that contains the de-identi?ed electronic health records of approximately 4,100 diabetic patients from the Emory Healthcare System to compare the predictive power of machine learning-based algorithms with the standard risk scoring systems. These algorithms will be evaluated on calibration, discrimination, and ease of interpretability. The results of this work will provide insight as to how to develop a machine learning?based prediction system that can identify high-risk diabetic heart disease patients. The study may also shed light on the best approaches for fusing data from multiple heterogeneous sources to build a better predictive model and potentially identify novel indicators of high- risk diabetic heart disease factors. Moreover, the work will help inform a larger multi-site study of diabetic heart disease risk prediction and develop methods to generalize the results to a broader spectrum of comorbidities. This project is consistent with the National Library of Medicine's mission to translate biomedical research into practice.

Public Health Relevance

Diabetic patients are risk of developing diabetic heart disease which can lead to high healthcare expenditure, high resource utilization, and poor patient outcomes. Existing diabetic risk prediction models can suffer from poor calibration and predictive accuracy. This project develops a novel and practical analytic tool to identify patients at high-risk of developing diabetic heart disease.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Scientist Development Award - Research & Training (K01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Emory University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code