Because of the profound effect of adverse drug events (ADEs) on patient safety, the FDA, AHRQ and Institute of Medicine have flagged post-marketing pharmacovigilance of emerging medications as a high national research priority. The FDA, Foundation for the NIH and PhARMA formed the Observational Medical Outcomes Partnership (OMOP) to develop and compare methods for identification of ADEs, and the FDA announced its Sentinel Initiative. Congress created the Reagan Udall Foundation (RUF) for the FDA in response to the FDA's own FDA Science and Mission at Risk report, and two years ago OMOP activities were incorporated into RUF. As the FDA moves forward with its development of Sentinel, including work on Mini-Sentinel, there is a need for researchers around the country to continue to develop better methods, and better evaluation methodologies for those methods. A robust research community working on algorithms for pharmacosurveillance, using electronic health records (EHRs) and claims databases will provide a substrate of ever-improving methods on which the nation's regulatory pharmacovigilance infrastructure can build. Indeed an important motivation of OMOP and Mini-Sentinel was to spur the development of such a community. Machine learning has attracted widespread attention across a range of disciplines for its ability to construct accurate predictive models. Therefore machine learning is especially appropriate for the problems of ADE identification and prediction: identifying ADEs from observational data, and predicting which patients are most at risk of suffering the identified ADE. Our current award has demonstrated the ability of machine learning to address both of these tasks. It has added to the existing evidence that consideration of temporal ordering of events, such as drug exposure and diagnoses, is critical for accuracy in identification and prediction of ADEs. The proposed work seeks to further improve upon these methods by building on recent advances in the field of machine learning, by our group and by others, in graphical model learning and in explicit modeling of irregularly-sampled temporal data. The latter is especially important because observational health databases, such as EHRs and claims databases, are not simple time series. Patients typically do not come into the clinic at regular intervals and have the same labs, vitals, and other measurements in lock step with one another. Building better ADE detection and prediction algorithms cannot be accomplished simply by machine learning research, even if that research is taking account of related work from relevant parts of computer science, statistics, biostatistics, epidemiology, pharmaco-epidemiology, and clinical research. Better methods are needed also for evaluation, that is, for estimating how well a new algorithm, or a new use of an existing algorithm, will perform at identifying ADEs associated with a new drug on the market, or at predicting which patients are most at risk of that ADE. More research and evaluation is also needed at the systems level: how can we best construct end-to-end pharmacovigilance systems that sit atop a large observational database and flag potential ADEs for human experts to further investigate? What kinds of information and statistics should such a system provide to the human experts? This renewal will address the following aims: (1) improve upon machine learning methods for identification and prediction of ADEs, taking advantage of synergies between these two distinct tasks; (2) improve upon existing methods for evaluating ADE detection, building on advances in machine learning for information extraction from scientific literature; (3) improve upon existing methods for evaluating ADE prediction, building upon advances in machine learning for automated support of phenotyping and also building upon improved methods for efficiently obtaining expert labeling of borderline examples of a phenotype; and (4) use the methods developed in the first three aims to construct and evaluate an end-to-end pharmacosurveillance system integrated with the Marshfield Clinic EHR Data Warehouse. Machine learning plays a central and unifying role throughout all four aims. Our investigator team consists of machine learning researchers with experience in analysis of clinical, genomic, and natural language data (Page, Natarajan), a leading pharmaco-epidemiologist with expertise in building systems to efficiently obtain expert evaluation and labeling of phenotypes (Hansen), a leader in phenotyping from EHR data (Peissig), and an MD/PhD practicing physician with years of experience and leadership in the study of ADEs (Caldwell). In addition to building on results of the prior award, we will build on our experiences with OMOP, the International Warfarin Pharmacogenetics Consortium, the DARPA Machine Reading Program, and interactions with the FDA.

Public Health Relevance

Adverse drug events (ADEs) carry a high cost each year in life, health and money. Congress, the FDA, the NIH and PhARMA have responded with new initiatives for identifying and predicting occurrences of ADEs. It has been widely recognized within initiatives such as Sentinel and the Observational Medical Outcomes Partnership that addressing ADEs requires data, standards and methods for data analysis and mining. This proposal addresses the need for new methods for both identifying previously- unanticipated ADEs and predicting occurrences of a known ADE. It also addresses the needs for improved evaluation and integrated systems approaches.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Wisconsin Madison
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Geng, Sinong; Kuang, Zhaobin; Liu, Jie et al. (2018) Stochastic Learning for Sparse Discrete Markov Random Fields with Controlled Gradient Approximation Error. Uncertain Artif Intell 2018:156-166
Cheng, Ning; Rahman, Md Motiur; Alatawi, Yasser et al. (2018) Mixed Approach Retrospective Analyses of Suicide and Suicidal Ideation for Brand Compared with Generic Central Nervous System Drugs. Drug Saf 41:363-376
Dhami, Devendra Singh; Kunapuli, Gautam; Das, Mayukh et al. (2018) Drug-Drug Interaction Discovery: Kernel Learning from Heterogeneous Similarities. Smart Health (Amst) 9-10:88-100
Dhami, Devendra Singh; Soni, Ameet; Page, David et al. (2017) Identifying Parkinson's Patients: A Functional Gradient Boosting Approach. Artif Intell Med (2017) 10259:332-337
Kuang, Zhaobin; Geng, Sinong; Page, David (2017) A Screening Rule for ?1-Regularized Ising Model Estimation. Adv Neural Inf Process Syst 30:720-731
Kuang, Zhaobin; Peissig, Peggy; Costa, VĂ­tor Santos et al. (2017) Pharmacovigilance via Baseline Regularization with Large-Scale Longitudinal Observational Data. KDD 2017:1537-1546
P Tafti, Ahmad; Badger, Jonathan; LaRose, Eric et al. (2017) Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure. JMIR Med Inform 5:e51
Natarajan, Sriraam; Bangera, Vishal; Khot, Tushar et al. (2017) Markov Logic Networks for Adverse Drug Event Extraction from Text. Knowl Inf Syst 51:435-457
Kuang, Zhaobin; Thomson, James; Caldwell, Michael et al. (2016) Computational Drug Repositioning Using Continuous Self-Controlled Case Series. KDD 2016:491-500
Burnside, Elizabeth S; Liu, Jie; Wu, Yirong et al. (2016) Comparing Mammography Abnormality Features to Genetic Variants in the Prediction of Breast Cancer in Women Recommended for Breast Biopsy. Acad Radiol 23:62-9

Showing the most recent 10 out of 35 publications