Suicide is the tenth leading cause of death in the United States, accounting for more than 40,000 deaths annually. Despite ongoing efforts to reduce the burden of suicide and suicidal behavior, rates have remained relatively constant over the past half century. Attempts to predict suicidal behavior have relied almost exclusively on self-reporting of suicidal thoughts and intentions. This is problematic because of well-known reporting biases and the fact that many people at high risk are motivated to deny suicidal thoughts to avoid hospitalization. Even though the majority of all suicide decedents have contact with a healthcare professional in the month before their death, suicide risk is rarely detected in such cases. Efforts to identify risk factors have also been stymied by the fact that suicide is a low-base rate event so that very large samples are needed to test the complex combinations of factors that are likely to contribute to risk. The widespread adoption of longitudinal electronic health records (EHRs) has created a powerful but still under-utilized resource for detecting and predicting important health outcomes. In prior work using machine learning methods to analyze structured EHR data, we have developed predictive models that detect up to 45% of first-episode suicidal behavior, on average 3 years in advance. Here we aim to systematically extend and improve our EHR prediction methods in a large healthcare system (N = 4.6 million patients) by incorporating 1) external public record datasets (LexisNexis SocioEconomic Health Attribute data) that include environmental, socioeconomic, and life event information; 2) natural language processing (NLP) to leverage unstructured EHR text, including text-based scores that capture RDoC domains; 3) a novel method of deriving temporal risk envelopes to capture the time-dependent effects of individual risk factors; and 4) clinical risk trajectories that incorporate ordered temporal sequences of risk factors. We will systematically compare the performance of each of these approaches to identify optimal strategies for enhancing risk surveillance and prediction in healthcare settings. Completion of these aims would represent a crucial step towards novel, clinically deployable, and potentially transformative tools for improving outcomes for those at risk for suicide and suicidal behavior.
Suicide is the tenth leading cause of death in the United States, accounting for more than 40,000 deaths annually, with 1.4 million attempting suicide annually. Unfortunately, efforts to identify those at high risk have had limited success. We have previously used machine learning methods to predict risk of suicidal behavior using the vast resources of data available in electronic health records. The proposed research will extend and improve this work by leveraging external datasets, medical record notes, and trajectories of risk over time to facilitate novel, clinically relevant, and potentially transformative tools for improving outcomes for those at risk for suicide and suicidal behavior.