Because of the direct effect on patient safety, the FDA, AHRQ and Institute of Medicine have flagged post- marketing pharmacovigilance of emerging medications as a high national research priority. The FDA, Foundation for the NIH and PhARMA have formed the Observational Medical Outcomes partnership to develop and compare methods for identification of adverse drug events (ADEs), and the FDA has announced its Sentinel Initiative. The proposed work will develop and study machine learning for ADE identification and prediction. The latter, easier task of ADE prediction assumes that an ADE has already been identified -- such as the association between Cox2 inhibitors (Cox2ib) and myocardial infarction (MI) - and the goal is to construct a model that can accurately predict which patients are most susceptible to having the ADE, e.g., having an MI if they take a Cox2 inhibitor. Our preliminary results show that using machine learning we can already make predictions at 75% sensitivity with 75% specificity. The task of ADE identification is more difficult than ADE prediction, because we do not have an observed class variable. Given that a new drug has been placed on the market, this task seeks to determine whether any previously-unanticipated adverse event is caused by the drug. Because we do not know in advance what this event is - it may not even correspond to an existing diagnosis code - this task does not neatly fit into the standard supervised learning paradigm. Our approach is to use reverse machine learning to build a post- marketing surveillance tool in order to predict and/or detect adverse reactions to drugs from electronic medical records (EMRs) or claims data. We show both theoretically and with preliminary empirical results that this approach can discover one or more subgroups of patients who are characterized by previously-unanticipated adverse events - events that patients on the drug suffer at a higher rate than patients not on the drug. These events do not have to correspond to previously-defined ADEs. In order to build and evaluate a machine learning-based system for ADE identification and prediction, this proposal will address the following specific aims: (1) apply supervised machine learning to the task of ADE prediction - predicting which patients are most likely to suffer a known ADE if given the drug;(2) apply reverse machine learning to identify novel ADEs;(3) provide a complete software system for machine learning-based identification and prediction of ADEs. This system will be tested on both the Marshfield Clinic's EMR, some preliminary results of which are presented in this proposal, and on real and synthetic datasets available through the Observational Medical Outcomes Partnership (OMOP).

Public Health Relevance

Adverse drug events (ADEs) carry a high cost each year in life, health and money. Congress, the FDA, the NIH and PhARMA have responded with new initiatives for identifying and predicting occurrences of ADEs. It has been widely recognized within initiatives such as Sentinel and the Observational Medical Outcomes Partnership that addressing ADEs requires data, standards and methods for data analysis and mining. This proposal addresses the need for new methods for both identifying previously- unanticipated ADEs and predicting occurrences of a known ADE. It will further develop and thoroughly evaluate novel machine learning approaches to these difficult tasks.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Wisconsin Madison
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Ye, Zhan; Mayer, John; Ivacic, Lynn et al. (2015) Phenome-wide association studies (PheWASs) for functional variants. Eur J Hum Genet 23:523-9
Hebbring, Scott J (2014) The challenges, advantages and future of phenome-wide association studies. Immunology 141:157-65
Liu, Jie; Zhang, Chunming; Burnside, Elizabeth et al. (2014) Multiple Testing under Dependence via Semiparametric Graphical Models. Proc Int Conf Mach Learn 2014:955-963
Liu, Jie; Zhang, Chunming; Burnside, Elizabeth et al. (2014) Learning Heterogeneous Hidden Markov Random Fields. JMLR Workshop Conf Proc 33:576-584
Peissig, Peggy L; Santos Costa, Vitor; Caldwell, Michael D et al. (2014) Relational machine learning for electronic health record-driven phenotyping. J Biomed Inform 52:260-70
Hebbring, S J; Schrodi, S J; Ye, Z et al. (2013) A PheWAS approach in studying HLA-DRB1*1501. Genes Immun 14:187-91