The need to monitor unintended effects of approved drugs has been highlighted by several recent high-profile events in which fatal side effects of drugs were detected after their release to market. Notoriously, the Cox-2 inhibitor rofecoxib (Vioxx) was withdrawn from market on account of evidence suggesting that treatment with the drug increased the rate of myocardial infarction. More recently, proton pump inhibitors have been identified with a host of previously undetected serious side effects, including chronic kidney disease. Statistical analyses of several sorts of data have been undertaken in an effort to mitigate the morbitidy and mortality resulting from such side effects by accelerating their detection. These include data from adverse event reporting systems, Electronic Health Records (EHR) and administrative claims data, social media communication and consumer search logs. Each of these sources presents challenges related to data completeness, accuracy, quality and representation, as well as the potential for bias. Though methods for combining multiple data sources show some promise as a way to address their particular inadequacies, strongly correlated drug-event pairs emerging from secondary analysis of observational data must ultimately be reviewed by domain experts to assess their implications. As the availability of the prerequisite expertise is limited, there is a pressing need for new methods to distinguish plausibly causal relationships from the large number of false positive associations that may emerge from large-scale analysis of observational data. In the proposed research, we will develop automated methods through which large amounts of knowledge extracted from the biomedical literature are used to constrain the parameterization of predictive models of large data sets. These methods will leverage high-dimensional distributed vector representations of conceptual relations extracted from the literature to integrate extracted knowledge into predictive models of observational data. Our hypothesis is that the predictions that result from such joint models will be both biologically plausible and strongly associated, resulting in more accurate predictions than those that can be obtained through estimation of correlation from observational data alone. The developed methods will be evaluated formatively for accuracy against a set of drug/side-effect reference standards, and summatively for their ability to to predict label changes such as ?black box? warnings using historical data and knowledge to estimate their ?time-to-detection? of safety concerns. In addition, we will develop and evaluate an interactive interface permitting users to explore the evidence used by the resulting models to make predictions, by retrieving supporting assertions from the literature and statistics from observational data. If successful, the proposed research will provide the means to identify plausible drug-event pairs for regulatory purposes, mitigating consequent morbidity and mortality. In addition, the methods will provide a generalizable approach that can be used to apply knowledge derived from the biomedical literature to draw robust inferences from observational clinical data.
The need to monitor unintended effects of medications has been highlighted by several high-profile events in which fatal side effects of approved drugs were detected after their release to market. In the proposed research, we will develop and evaluate methods to identify biologically plausible adverse drug events using both observational data and knowledge extracted from the biomedical literature. If successful, these methods will provide the means for earlier detection of harmful drug effects, limiting consequent morbidity and mortality.