Heart failure (HF) prevalence has increased and will continue so over the next 30 years with a profound individual and societal burden. Early detection of HF may be useful in mitigating this burden. The purpose of this proposal is to develop robust predictive models that make use of longitudinal electronic health record (EHR). Our long term goal is to use such models to detect HF at an earlier stage (e.g., AHA/ACA Stages A or B) than usually occurs in primary care. We have completed extensive preliminary work using 10 years of longitudinal EHR data on primary care patients. Using text mining and machine learning tools we have found that Framingham criteria are documented in the EHR long before more specific diagnostic studies are done. These symptoms are considerably more common among incident HF cases than controls two to four years before diagnosis. Moreover, clinical, laboratory, diagnostic, and other data routinely captured in the EHR predicts future HF diagnosis. We propose to extend this work on early detection of HF with the following aims: 1) To develop more sensitive and specific criteria for use of Framingham HF signs and symptoms in the early detection of HF. We have shown that positive and negative affirmation of Framingham signs and symptoms are useful in HF detection 1-4 years before diagnosis. We propose to address the following: a) Which Framingham signs and symptoms and combinations thereof are most useful for early detection? b) Are there temporal sequences and correlations among signs and symptoms that improve accuracy of detection? c) How do the criteria vary by HF subtype? We hypothesize that analysis of routinely documented signs and symptoms data will yield a clinically meaningful improvement in the accuracy of detecting HF 1 to 2 years before actual diagnosis; 2) To determine the differential improvement in accuracy of predicting diagnosis of HF by combining common fixed field EHR data with text data to improve early detection of HF. Our preliminary work indicates that longitudinal EHR data (e.g., clinical, laboratory, health behaviors, diagnoses, use of care, etc) are useful in predicting future HF diagnosis. Based on these findings, we recognize an increasingly sophisticated analysis will be required to identify how to use these data to optimize predictive power. We hypothesize that the specific models and the performance of these models will vary by HF subtypes of HF; 3) To determine how digital ECG related measures can be used alone and in combination with other data to improve early detection of HF. Real time access to digital ECG data affords unique opportunities to extract a diversity of measures that may be useful in primary care in the early detection of HF; and 4) To develop preliminary operational protocols for early detection of HF in primary care. We will need to consider how the output from the model can be used to support clinical guidance and shared decision-making. Moreover, models need to be developed for data rich and data poor settings. The long term goal of the proposed work is relevant to the national priority for adoption of EHRs in clinical practice and for meaningful use of such technology.

Public Health Relevance

Heart Failure (HF) strikes one in 5 US citizens over age 40, has a profound impact on health, and is almost always detected too late to allow doctors to substantially reduce morbidity and mortality. We propose to use sophisticated analytic tools to search through electronic health records of patients for early signals of HF. The long term goal of our work is to use such tools to detect HF early enough to allow doctors to change the course of disease and substantially reduce HF morbidity and the risk of death.

National Institute of Health (NIH)
National Heart, Lung, and Blood Institute (NHLBI)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Nelson, Cheryl R
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
California Pacific Medical Center Research Institute
San Francisco
United States
Zip Code
Choi, Edward; Schuetz, Andy; Stewart, Walter F et al. (2017) Using recurrent neural network models for early detection of heart failure onset. J Am Med Inform Assoc 24:361-370
Choi, Edward; Bahadori, Mohammad Taha; Schuetz, Andy et al. (2016) Doctor AI: Predicting Clinical Events via Recurrent Neural Networks. JMLR Workshop Conf Proc 56:301-318
Ng, Kenney; Steinhubl, Steven R; deFilippi, Christopher et al. (2016) Early Detection of Heart Failure Using Electronic Health Records: Practical Implications for Time Before Diagnosis, Data Diversity, Data Quantity, and Data Density. Circ Cardiovasc Qual Outcomes 9:649-658
Wang, Yajuan; Steinhubl, Steven R; Defilippi, Chrisopher et al. (2015) Prescription Extraction from Clinical Notes: Towards Automating EMR Medication Reconciliation. AMIA Jt Summits Transl Sci Proc 2015:188-93
Wang, Yajuan; Ng, Kenney; Byrd, Roy J et al. (2015) Early detection of heart failure with varying prediction windows by structured and unstructured data in electronic health records. Conf Proc IEEE Eng Med Biol Soc 2015:2530-3
Vijayakrishnan, Rajakrishnan; Steinhubl, Steven R; Ng, Kenney et al. (2014) Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record. J Card Fail 20:459-64
Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R et al. (2014) PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records. J Biomed Inform 48:160-70
Byrd, Roy J; Steinhubl, Steven R; Sun, Jimeng et al. (2014) Automatic identification of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records. Int J Med Inform 83:983-92
Ho, Joyce C; Ghosh, Joydeep; Steinhubl, Steve R et al. (2014) Limestone: high-throughput candidate phenotype generation via tensor factorization. J Biomed Inform 52:199-211