The goal of this proposal is to develop accurate, generalizable and interpretable predictive models based on electronic health records (EHR) that detect heart failure (HF) in primary care patients one to two years before a clinical diagnosis and to translate the models for use in clinical care. Case-control datasets from two large US health systems (i.e., >13,000 incident HF cases and >120,000 controls) will be created and used to address two of the three study aims.
For Aim 1 (Improve prediction of pre-diagnostic HF and model generalizability), recursive neural network (RNN) models will be used to improve prediction accuracy when compared to prior work that was based on traditional machine learning models (e.g., random forest, lasso logistic regression). It is expected that RNN models will perform better because temporality of EHR events can be captured.
Aim 1 will also focus on improving model generalizability (i.e., among patients within and across health systems) by leveraging RNN models and by addressing challenges caused by variation in patient level EHR data (e.g., density of data) that are independent of a patient's actual health status.
Aim 2 (Identify and clinically validate pre-diagnostic HF phenotypes) will focus on the identification of pre-diagnostic HF phenotypes and the use of content from RNN models derived under Aim 1. Three levels of analysis will be completed. First, we will focus on pathophysiologic heterogeneity that is represented, in part, by HF with preserved ejection fraction (HFpEF) and HF with reduced ejection fraction (HFrEF). But, HFpEF is considered to be more heterogeneous than HFrEF. New methods will be developed to reliably identify pre-diagnostic HF phenotypes. Second, phenotypes will be clinically validated for reliability and coherence and compared to clinical judgement based on a review of the patient's record. Third, we will address a challenge with RNN models, as they generate ?black box? solutions that are seemingly uninterpretable. We propose to develop new methods to extract and represent the content from RNN models. We hypothesize that when phenotype status is combined with information extracted from Aim 1 RNN models it will be judged by expert clinician reviews to be superior for prevention care to phenotype status when it is combined with information extracted from traditional machine learning models or to a direct review of the patient's EHR. Finally, we will prospectively validate the phenotype and RNN models using a large primary care cohort being created by Sutter and related serial biobanked blood samples.
For aim 3 we will determine how accurately the models predict elevated biomarker levels that are known to be sensitive and specific indicators of HF disease progression.

Public Health Relevance

The proposed research will contribute valuable knowledge that will assist doctors to identify patients who are at high-risk of incident heart failure 12 to 24 months before the actual diagnosis and that exceed what is possible when relying on traditional signs, symptoms or risk factors. Doctors will be able to provide a more targeted approach to reduce the future risk of heart failure and the risks of morbidity and accelerated mortality that high risk patients face. Moreover, the methods that are developed for the early detection of heart failure in this study will help other researchers in creating more accurate and generalizable predictive models when using electronic health records data and when applying these models for use in clinical care.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
High Priority, Short Term Project Award (R56)
Project #
2R56HL116832-04
Application #
9779100
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Nelson, Cheryl R
Project Start
2018-09-20
Project End
2019-08-31
Budget Start
2018-09-20
Budget End
2019-08-31
Support Year
4
Fiscal Year
2018
Total Cost
Indirect Cost
Name
California Pacific Medical Center Research Institute
Department
Type
DUNS #
071882724
City
San Francisco
State
CA
Country
United States
Zip Code
94107