Stroke is a highly heterogeneous and complex disease and is a leading cause of morbidity and mortality worldwide. Identification of the cause of disease is essential for risk stratification and optimal treatment, but can be difficult, as up to 35% of causes are undetermined by traditional subtyping criteria and very few causative genetic variants have been found. In addition, certain causes may be hidden within the clinical picture of a patient, such as an adverse drug reaction. Using data-driven approaches to analyze the medical records of patients may uncover novel patterns of risk factors and clinical features leading to stroke. The long-term goal of this research is to identify novel subtypes of highly heterogeneous diseases such as stroke and to reduce the genetic heterogeneity of a disease cohort by identifying patients with the same subtype. The objective of this application is to propose a pipeline that applies a data-driven analysis of medical notes to identify novel subtypes of stroke, focus on a subtype caused by an adverse drug reaction or drug pair interaction, validate the subtype in a genotyped study cohort, and look for gene variant enrichment in this cohort. This application?s central hypothesis is that applying deep learning to the electronic health record (EHR) of acute ischemic stroke patients will form subtypes based on more granular information than currently implemented and with reduced genetic heterogeneity by identifying novel patterns of risk factors and clinical picture leading to the stroke. In addition, we hypothesize that at least one subtype will identify patients whose stroke is an adverse drug reaction or drug-drug interaction. To do this, Aim 1 will first identify all acute ischemic stroke patients in the EHR by developing a machine learning classifier trained on structured data in the EHR.
Aim 2 will then build and train an unsupervised deep learning algorithm on text from medical notes to identify clusters, or subtypes, of patients with similar clinical pictures.
Aim 3 will finally validate reduction in genetic heterogeneity of these cohorts by estimating observational heritability of all subtypes using a tool created in our lab and comparing this with the heritability estimates of subtypes derived from physician-based criteria. It will also focus on a not well-studied subtype, stroke due to an adverse drug reaction or drug-drug interaction, by identifying its enrichment in the novel subtypes, validating this subtype in a study cohort with genotyped data, and finally looking for enrichment of pharmacogenetic variants in this subtype.
These aims will generate a computational pipeline that identifies novel subtypes of acute ischemic stroke, enabling improved future genetic studies by reducing genetic heterogeneity of cohorts and improved understanding of the underlying causes of the disease.

Public Health Relevance

Stroke is a highly complex disease that is a leading cause of death and severe disability. Identifying the cause of stroke is essential for risk stratification and optimal treatment, but up to a third of cases are of undetermined cause, and very few genetic variants have been identified due to the heterogeneity of the disease. To address this, this research proposal develops a computational pipeline that applies data-driven deep learning to medical notes to identify novel combinations of environmental, genetic, and medical risk factors causing the disease; this tool may clarify the etiology of unknown stroke cases, provide cohorts of patients enriched with similar genetic variants, and be expanded to subtype any disease.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Individual Predoctoral NRSA for M.D./Ph.D. Fellowships (ADAMHA) (F30)
Project #
5F30HL140946-02
Application #
9772123
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Purkiser, Kevin
Project Start
2018-09-01
Project End
2021-08-31
Budget Start
2019-09-01
Budget End
2020-08-31
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Columbia University (N.Y.)
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
621889815
City
New York
State
NY
Country
United States
Zip Code
10032