Depression is an impairing and prevalent mood disorder in adolescence, affecting 1 in 6 youth by age 18. So far, the use of conventional statistical approaches has had limited success in delivering tools for accurate individualized prediction of future depression for a specific child. The objective of this project is to build advanced non-linear machine learning algorithms integrating information from multiple sources to deliver accurate, individualized prediction. To accomplish this objective, the research team will use 5,000 variables (biological, cognitive, socio-emotional, environmental) measured on multiple occasions between the prenatal period and age 10 in children from the Avon Longitudinal Study of Parents and Children (ALSPAC, N = 15,636). The goal is to use features from the prenatal period to age 10 to estimate risk of reaching clinical levels of depressive symptoms between the ages of 12 and 18 years old. The research team will pursue two specific aims.
The first aim i s to build an algorithm for accurate prediction of adolescent depression by using informative features from the prenatal period until age 10 with machine learning methods that capture complex, multi-variate associations. The team will use several techniques, including artificial neural networks that exploit temporal information (recurrent neural networks, Long Short-Term Memory networks) to identify constellations of highly predictive features. Based on early-life stress sensitization theory, the first hypothesis is that features from the prenatal and early postnatal periods (up to age 5) provide greater predictive power than features from ages 6-10.
The second aim i s to determine if features predicting depression are unique to depression or shared with anxiety disorder and substance use disorder. Machine learning algorithms will predict age 18 clinical diagnoses of depression, anxiety disorder, and substance use disorder. The team will test the second hypothesis that some predictive features will be unique for each disorder and some will be shared across all three disorder types (e.g., childhood trauma). By accomplishing these aims, the research team will devise a clinically useful algorithm to estimate a child?s probability of developing adolescent depression. All software that will be created for this project will be open-source, and made freely available online in public repositories. Algorithms that would allow accurate early identification of children at risk to develop depression during future adolescent years would provide new avenues for preemptive interventions. This would yield enormous public health benefits by prioritizing treatment and shifting developmental trajectories away from eventual disorder for millions of individuals worldwide. To realize the potential of this overall impact on the field and society, predictive models that calculate risk with high sensitivity and specificity in childhood are needed. The proposed project aims to use robust, rigorous machine learning algorithms to take on this challenge.

Public Health Relevance

The proposed research is relevant to public health because it focuses on depression in adolescence, which is highly prevalent and impairing during this life stage and a major risk factor for suicide. The project is relevant to NIMH?s mission because it aims to identify early behavioral and biological indicators of risk for depression in order to predict the onset of illness and support early preemptive interventions.

National Institute of Health (NIH)
National Institute of Mental Health (NIMH)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Child Psychopathology and Developmental Disabilities Study Section (CPDD)
Program Officer
Murphy, Eric Rousseau
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Davis
Schools of Arts and Sciences
United States
Zip Code