One hundred and twenty-three Americans die by suicide every day, and 800,000 individuals die from suicide globally every year. Five times as many people attempt suicide and 10-25 times as many contemplate suicide every year. Rates of suicidal thoughts or ideation and suicidal behaviors are increasing. Suicidal ideation alone causes mental and physical harm and is associated with worsened statuses of other illnesses. Suicidal ideation is often documented by clinical providers in their notes but has been shown to only be included in diagnostic or billing codes 3% of the time. Historical suicide attempts are also under- captured by billing codes alone. Improving identification of those with suicidal ideation and attempts might enhance prevention through earlier contact with those at risk. A growing literature shows that clinical predictive models with longitudinal electronic health records (EHR) can predict suicide attempts with good performance. These models have also been used by groups like ours to improve power of large-scale genetic analyses of suicide attempt risk. The investigators used their validated models to identify the signal for suicide attempt, a ?phenotype?, to run genetic analyses showing suicide attempt risk is 4% heritable. This team also showed that suicide attempt risk was significantly genetically correlated with depressive symptoms, neurotic symptoms, and schizophrenia. The investigators propose to validate a phenotype of suicidal ideation and to improve their existing phenotypes of attempt risk to power large-scale genetic analyses across two major biobanks, Vanderbilt?s BioVU and the UK Biobank. They will use natural language processing (NLP) and analytics on Vanderbilt?s EHR to develop and test a phenotype of suicidal ideation. They will use NLP to improve capture of cases of suicide attempt to refine existing algorithms. They will apply these phenotypes at scale to BioVU. Their Stanford team members will use patient-reported suicidal ideation histories in another major biobank, UK Biobank, to independently run genetic analyses of suicidal ideation risk in a different population. They will further analyze clinical and genetic risk factors to better understand who will transition from suicidal ideation to suicide attempt. The project combines expertise in clinical informatics, machine learning, and large-scale genomics, as well as domain-specific expertise in suicide risk research. Spanning two major biobanks across two countries, the algorithms and methods developed have maximal portability, facilitating next-step investigations. Successful identification of suicidal ideation and attempt risk might inform clinical prevention. Better understanding of risk factors that predict who will proceed from suicidal ideation to suicidal behaviors would help allocate prevention resources to those who need them most.

Public Health Relevance

Suicide is one of the top 10 leading causes of death in the United States and 10-25 times the number that die contemplate suicide every year. Large gaps remain in our understanding of genetic risk factors of suicidal ideation and attempt as well as genetic and clinical risk factors that contribute to transition from ideation to attempt. This investigation will use natural language processing, machine learning, electronic health records for over 1 million people, and two major biobanks from the United States and the United Kingdom to study the clinical and genetic risk factors necessary to improve identification of patients at highest risk of suicidal behaviors.

National Institute of Health (NIH)
National Institute of Mental Health (NIMH)
Research Project (R01)
Project #
Application #
Study Section
Behavioral Genetics and Epidemiology Study Section (BGES)
Program Officer
Dutka, Tara
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Vanderbilt University Medical Center
United States
Zip Code