Because of the direct effect on patient safety, the FDA, AHRQ and Institute of Medicine have flagged post- marketing pharmacovigilance of emerging medications as a high national research priority. The FDA, Foundation for the NIH and PhARMA have formed the Observational Medical Outcomes partnership to develop and compare methods for identification of adverse drug events (ADEs), and the FDA has announced its Sentinel Initiative. The proposed work will develop and study machine learning for ADE identification and prediction. The latter, easier task of ADE prediction assumes that an ADE has already been identified -- such as the association between Cox2 inhibitors (Cox2ib) and myocardial infarction (MI) - and the goal is to construct a model that can accurately predict which patients are most susceptible to having the ADE, e.g., having an MI if they take a Cox2 inhibitor. Our preliminary results show that using machine learning we can already make predictions at 75% sensitivity with 75% specificity. The task of ADE identification is more difficult than ADE prediction, because we do not have an observed class variable. Given that a new drug has been placed on the market, this task seeks to determine whether any previously-unanticipated adverse event is caused by the drug. Because we do not know in advance what this event is - it may not even correspond to an existing diagnosis code - this task does not neatly fit into the standard supervised learning paradigm. Our approach is to use reverse machine learning to build a post- marketing surveillance tool in order to predict and/or detect adverse reactions to drugs from electronic medical records (EMRs) or claims data. We show both theoretically and with preliminary empirical results that this approach can discover one or more subgroups of patients who are characterized by previously-unanticipated adverse events - events that patients on the drug suffer at a higher rate than patients not on the drug. These events do not have to correspond to previously-defined ADEs. In order to build and evaluate a machine learning-based system for ADE identification and prediction, this proposal will address the following specific aims: (1) apply supervised machine learning to the task of ADE prediction - predicting which patients are most likely to suffer a known ADE if given the drug;(2) apply reverse machine learning to identify novel ADEs;(3) provide a complete software system for machine learning-based identification and prediction of ADEs. This system will be tested on both the Marshfield Clinic's EMR, some preliminary results of which are presented in this proposal, and on real and synthetic datasets available through the Observational Medical Outcomes Partnership (OMOP).

Public Health Relevance

Adverse drug events (ADEs) carry a high cost each year in life, health and money. Congress, the FDA, the NIH and PhARMA have responded with new initiatives for identifying and predicting occurrences of ADEs. It has been widely recognized within initiatives such as Sentinel and the Observational Medical Outcomes Partnership that addressing ADEs requires data, standards and methods for data analysis and mining. This proposal addresses the need for new methods for both identifying previously- unanticipated ADEs and predicting occurrences of a known ADE. It will further develop and thoroughly evaluate novel machine learning approaches to these difficult tasks.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Wisconsin Madison
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Dhami, Devendra Singh; Soni, Ameet; Page, David et al. (2017) Identifying Parkinson's Patients: A Functional Gradient Boosting Approach. Artif Intell Med (2017) 10259:332-337
P Tafti, Ahmad; Badger, Jonathan; LaRose, Eric et al. (2017) Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure. JMIR Med Inform 5:e51
Natarajan, Sriraam; Bangera, Vishal; Khot, Tushar et al. (2017) Markov Logic Networks for Adverse Drug Event Extraction from Text. Knowl Inf Syst 51:435-457
Kuang, Zhaobin; Thomson, James; Caldwell, Michael et al. (2016) Computational Drug Repositioning Using Continuous Self-Controlled Case Series. KDD 2016:491-500
Wu, Yirong; Abbey, Craig K; Liu, Jie et al. (2016) Discriminatory power of common genetic variants in personalized breast cancer diagnosis. Proc SPIE Int Soc Opt Eng 9787:
Burnside, Elizabeth S; Liu, Jie; Wu, Yirong et al. (2016) Comparing Mammography Abnormality Features to Genetic Variants in the Prediction of Breast Cancer in Women Recommended for Breast Biopsy. Acad Radiol 23:62-9
Kuang, Zhaobin; Thomson, James; Caldwell, Michael et al. (2016) Baseline Regularization for Computational Drug Repositioning with Longitudinal Observational Data. IJCAI (U S) 2016:2521-2528
Hebbring, Scott J; Rastegar-Mojarad, Majid; Ye, Zhan et al. (2015) Application of clinical text data for phenome-wide association studies (PheWASs). Bioinformatics 31:1981-7
Odom, Phillip; Bangera, Vishal; Khot, Tushar et al. (2015) Extracting Adverse Drug Events from Text using Human Advice. Artif Intell Med (2015) 2015:195-204
Wu, Yirong; Liu, Jie; Del Rio, Alejandro Munoz et al. (2015) Developing a clinical utility framework to evaluate prediction models in radiogenomics. Proc SPIE Int Soc Opt Eng 9416:

Showing the most recent 10 out of 28 publications