Cardiovascular disease (CVD) is the single largest killer in the United States for both men and women in every racial/ethnic group. Thus, accurate and systematic evaluation of CVD risk represents an aspect of Precision Medicine that will touch every patient. CVD risk scores that are currently the standard of care are derived from research cohorts and are particularly inaccurate in women, older patients, and those with missing data. The goal of this Precision Medicine based application is to capitalize on the depth and breadth of clinical data within electronic health record (EHR) systems to revolutionize CVD risk prediction, thereby optimizing personalized care for every patient. Our proposed approach is innovative in that we have identified and addressed the most significant barriers to development of an EHR-based risk score. Novel aspects of this research include: 1) use of complete EHR data to develop and validate algorithms to define a variety of risk factors (e.g., reproductive history), thus building a comprehensive risk profile for each patient that incorporates diagnosis and procedure codes, laboratory values, clinical test results, patient provided information (e.g., alcohol use), and natural language processing of unstructured clinical text; 2) incorporation of age at onset of risk factors; 3) use of highly flexible machine learning techniques in the form of generalized boosted regression modeling; 4) exploration of a new deep learning model for censored EHR data; and 5) determination of the extent of risk reclassification in multiple geographically-defined populations, including an underserved minority population. Furthermore, genetic studies demonstrate that incorporating variants into current risk models improves risk prediction and use of an individual's genetic risk could further enhance our ability to deliver precision medicine to every patient. Therefore, we seek to develop a sex-specific next-generation CVD risk prediction score using EHR data in combination with genetic variants. This paradigm is a significant departure from the current one that relies on scores derived from relatively small research cohorts that use only a restricted set of clinical parameters that differentially misclassify an individual's risk, especially in women. Our access to empirical clinical EHR data for hundreds of thousands of patients uniquely positions us to 1) develop a sex-specific risk prediction model for incident CVD using data from the EHR; 2) assess the performance of the sex-specific EHR risk score in an independent non-urban and rural population; and 3) identify and characterize patients for whom genetic information improves CVD prediction beyond the clinical risk score. Successful completion of these aims has the potential to impact all adult patients, drive clinical practice changes to systematically collect sex-specific risk factors, and inform attempts to embed the next-generation CVD risk score into EHR systems for automated use in clinical care.

Public Health Relevance

Cardiovascular disease (CVD) is the single largest killer in the United States. We propose to use electronic health record data to improve our ability to accurately classify risk and identify those who would benefit from preventive therapies. Improved risk prediction will shed light on the mechanisms of CVD and potentially reduce incidence, save lives, and lower health care costs.

National Institute of Health (NIH)
National Heart, Lung, and Blood Institute (NHLBI)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Hsu, Lucy L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Mayo Clinic, Rochester
United States
Zip Code