Problematic prescription opioid use, defined as nonmedical use, misuse, or abuse of opioid medications, is epidemic in the US. Prescription opioid overdose deaths more than quadrupled from 1999 to 2015. Efforts by health care systems and payers to combat the opioid epidemic are impeded by a lack of accurate and efficient methods to identify individuals most at risk for problematic opioid use and overdose, leading to broad interventions that are burdensome to patients and expensive for payers. Payers are currently defining high risk and targeting interventions (e.g. pharmacy lock-in programs) based on individual risk factors, such as high opioid dosage, identified in prior studies using traditional statistical approaches. However, these traditional approaches have significant limitations, especially when handling large datasets with numerous variables, multi-level interactions, and missing data. Moreover, the prior studies focused on identifying risk factors rather than predicting actual risk. Alternatively, machine learning is an advanced technique that handles complex interactions in large data, uncovers hidden patterns, and yields precise prediction algorithms that, in many cases, are superior to those developed using traditional methods. Machine learning is widely used in activities from fraud detection to cancer genomics, but has not yet been applied to address the opioid epidemic. Accordingly, the proposed study will apply machine learning to develop prediction algorithms that can more accurately identify patients at high risk of problematic opioid use and overdose using data sources that are readily available to payers and health care systems. The project will build on existing academic-state partnerships to apply novel machine learning approaches to administrative claims data for all Medicaid beneficiaries in Pennsylvania (PA) and Arizona (AZ). The project will also link Medicaid data in AZ to electronic health records to capture clinical information (e.g., lab results, pain severity) not available in administrative data, along with death certificate data on lethal overdose. These data, covering 2007-2016, will be used to achieve two specific aims: (1) to develop and validate two separate prediction algorithms to identify patients at risk of problematic opioid use and opioid overdose; (2) to compare the accuracy of a prediction algorithm that integrates clinical data with Medicaid claims versus a claims-based approach alone to identify patients at risk of problematic opioid use and opioid overdose. The machine learning approaches will include random forests and TreeNet with representative classification trees, and the predictive ability (e.g., misclassification rates) of these algorithms will be compared to traditional statistical models. Given the high prevalence of mental health/substance use disorders (~50%) and opioid utilization (>20%) among Medicaid enrollees and the lack of adequate prediction algorithms, Medicaid is an ideal setting for the proposed project. These analyses will provide the partnering Medicaid programs with valuable information and tools that they can apply to more precisely target interventions to prevent problematic opioid use and overdose.

Public Health Relevance

Prescription opioid overdose deaths quadrupled from 1999 to 2015, and drug overdose is now the leading cause of injury deaths among adults in the United States. This project will use innovative machine learning methods and readily available data for Medicaid beneficiaries in two states hard hit by the epidemic ? Pennsylvania and Arizona ? to develop algorithms to accurately predict who is at risk of problematic prescription opioid use and overdose. This information will empower health systems, payers, and policymakers to more effectively target interventions to prevent prescription opioid misuse and its consequences.

National Institute of Health (NIH)
National Institute on Drug Abuse (NIDA)
Research Project (R01)
Project #
Application #
Study Section
Health Services Organization and Delivery Study Section (HSOD)
Program Officer
Thomas, David A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code