Early onset of alcohol use during adolescence is associated with increased probability of later alcohol dependence, polydrug abuse, victimization, conduct problems, psychiatric comorbidities, and delayed achievement of adult milestones. Methods that yield rapid, accurate, and reliable predictions of which children and teens are at risk for early onset can improve the targeting of prevention interventions and enable the concentration of resources on the most debilitating and costly cases. One promising and untapped approach to this prediction problem is machine learning (also called ?statistical learning,? ?data mining,? or ?predictive modeling?), a class of techniques arising from statistics, computer science, and engineering that seeks to build data-driven predictive algorithms. These techniques are most noticeably distinguished from ?traditional? statistical methods (e.g., ordinary least squares regression) by their extreme emphasis on prediction of future cases, rather than explanation of the current data, and thus they may offer dramatic advantages over traditional approaches to identifying which children and teens will develop early onset alcohol use. This proposal will explore the potential contribution of machine learning methods by directly comparing their predictive performance to that of the traditional approach in a large-scale, multisite longitudinal study of the development of early onset alcohol use (N = 731). If machine learning methods do significantly outperform the traditional approach, future directions might include the development and implementation of machine-learning- based screening methods for real-world use. On the other hand, if machine learning methods do not outperform the traditional approach, this will suggest that at least in the context of the present study (i.e., these predictors, timeline, and outcome), machine learning does not improve the prediction of early onset alcohol use. Analyses will investigate whether the performance of machine learning methods varies across the nature of predictor variables use, the age span covered, and the outcome to be predicted. Thus, the current proposal uses an extant longitudinal dataset to carry out two specific aims: (1) Train five different machine learning algorithms and one traditional algorithm (ordinary logistic regression) for predicting later early onset alcohol use in a subset (70%) of the data. (2) Test these six predictive algorithms on the rest (30%) of the data and directly compare their predictive performance in multiple contexts.

Public Health Relevance

Prospectively predicting which children and teens are at risk for early onset alcohol use enables targeted implementation of preventive interventions. Machine learning is a promising yet untapped approach that may be well-suited to this task. This study investigates the potential of several machine learning algorithms to contribute to the rapid, accurate, and reliable identification of individuals at risk for early onset alcohol use.

Agency
National Institute of Health (NIH)
Institute
National Institute on Alcohol Abuse and Alcoholism (NIAAA)
Type
Predoctoral Individual National Research Service Award (F31)
Project #
5F31AA026768-02
Application #
9753696
Study Section
Special Emphasis Panel (ZAA1)
Program Officer
Zha, Wenxing
Project Start
2018-07-30
Project End
2020-07-29
Budget Start
2019-07-30
Budget End
2020-07-29
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Arizona State University-Tempe Campus
Department
Psychology
Type
Schools of Arts and Sciences
DUNS #
943360412
City
Tempe
State
AZ
Country
United States
Zip Code
85287