Cardiovascular disease negatively affects millions of people worldwide. Globally, it accounts for approximately thirty percent of all deaths. Furthermore, a significant fraction of deaths caused by cardiovascular disease occur in a non-geriatric population; fifteen percent of all worldwide deaths are attributed to cardiovascular disease for people under the age of seventy. Treatment to prevent cardiovascular events should be based on highly individualized risk prediction. High risk patients should get more aggressive treatments because the risk of disease outweighs the burden of treatment, while low risk patients should be managed more conservatively. For example, anti-thrombotic therapy for coronary heart disease may increase bleeding risk and may not be appropriate for low-risk patients. Two primary kinds of cardiovascular disease are stroke and coronary heart disease, and there have been a number of developments in risk scores for both ailments. However, these risk scores only use a small fraction of the available measurements about a patient and treat risk as a collection of independent factors rather than considering how their interactions amplify or ameliorate risk. Moreover, a majority of the popular coronary heart disease and stroke risk scores are designed to be manually computed by a busy physician at the point of care, which further limits their scope and fidelity. Next generation risk scores for stroke and cardiovascular disease should take into account all of the available information in the electronic health record without the constraints of the parametric assumptions of traditional risk modeling. More accurate risk assessment of coronary heart disease and stroke will lead to better care and reduce the cardiovascular disease burden. Our vision is to capitalize on large collections of electronic health records along with recent advances in deep learning to build risk scores that use more available health information while making minimal mathematical assumptions about the nature of clinical risk. Our proposal propels the field from human computable independent risks calculations necessitated by previous limitations of technology to calculations that make use of deep learning to learn highly nonlinear risks and risk factor interactions. We additionally demonstrate how deep learning can be used to deal with the ever-present issue of missing values in medicine. Our proposal also targets an area under- explored by previous work on risk scores: fairness. Treatment quality is affected by the quality of risk estimation. This means populations where estimated risk is less accurate may receive worse care. Risk scores developed with simple models may only capture risk accurately for the majority population as simple models are not flexible enough to cover multiple populations. We seek to identify potential risk calculation differences with respect to race and ethnicity. We will construct and evaluate deep learning methods for coronary heart disease and stroke risk assessment from electronic health records. We will develop techniques to incorporate clinical text, handle missing data, and evaluate fairness of deep learning for cardiovascular risk scores. Finally, we will make our work available as open source code written in deep learning frameworks, at clinical conferences, and publications.

Public Health Relevance

Coronary heart disease and stroke create a large burden of disease in the world and require accurate risk assessment to aid in providing appropriate treatments. We will construct and validate deep learning risk scores for both coronary heart disease and stroke using large quantities of information from the electronic health record and take care to evaluate potential racial and ethnic biases present in the algorithm. More accurate coronary heart disease and stroke risk scores provided by this approach have the potential to reduce the overall cardiovascular disease burden with a specific focus on minority communities.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Research Project (R01)
Project #
1R01HL148248-01
Application #
9801981
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Wolz, Michael
Project Start
2019-09-01
Project End
2024-06-30
Budget Start
2019-09-01
Budget End
2020-06-30
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
New York University
Department
Biostatistics & Other Math Sci
Type
Organized Research Units
DUNS #
041968306
City
New York
State
NY
Country
United States
Zip Code
10012