Drugs undergo extensive testing in animals and clinical trials in humans before they are marketed for widespread use in the population. Pre-market testing produces reasonably high quality information about the efficacy of the drug as a treatment for the condition for which it was approved, but gives a very incomplete picture of the drug's safety. Post-marketing surveillance currently relies mainly on voluntary reporting to the FDA by health care professionals (and recently, patients themselves) through MedWatch, the FDA's safety information and adverse event reporting program. Self-reported patient information captures a valuable perspective that has been found to be of similar quality to that provided by health professionals, and currently it is only captured via the formal MedWatch form. The overarching goal of this application is to deploy the infrastructure needed to explore the value of informal social network postings as a source of "signals" of potential adverse drug reactions soon after the drugs hit the market, paying particular attention at the value such information might have to detect adverse events earlier than currently possible, and to detect effects not easily captured by traditional means. Despite the significant challenge of processing colloquial text, our prototype study in this direction showed promising performance in identifying adverse reactions mentioned in these postings, with significant correlations between the effects mentioned by the public and those documented for the drugs we studied.
Specific aims to be addressed include: 1). To establish the infrastructure that enables processing of online user comments about the drug on health-related social network websites. Particularly, we seek to recognize and extract mentions of adverse effects in those informal postings, and to map them to standard terminology. We will build on our preliminary lexical approach for finding the mentions, and propose a variation of machine learning (commonly referred to as active learning) where the machine learning framework has the ability to control what instances will be selected for use in the training data, among other innovative semantic approaches to normalization (mapping of the mentions to established, formal terms) and sentiment analysis (to discover whether a mention is reporting a positive or a negative effect);2) To evaluate the sensitivity and specificity of the extraction and identification systems, as well as the predictive value of the extracted knowledge through specific case studies of a set of drugs with well known adverse reactions and by monitoring postings about a select group of drugs released since 2007. Our existing manually annotated gold standard will be expanded through a dedicated annotation effort led by a pharmacologist (Karen Smith). 3) To compare the knowledge extracted from patient comments to what is derived from the established drug safety monitoring scheme overseen by the FDA. We recognize that the data obtained through the deployed infrastructure would not be able to be used to define an ADR standing on its own. However, if this method is validated, it could provide useful signals to complement the already established processes and data sources.

Public Health Relevance

Adverse drug reactions are currently listed as one of the top 10 causes of death in the US. Identifying adverse effects of drugs after they are publicly marketed depends mainly on voluntary reporting to the FDA by health care professionals and recently, patients themselves, via a formal online form. The goal of this project is to develop the tools needed to exploit the numerous informal social network postings that patients make to health-related social networks such as Daily Strength as a source of signals of potential adverse drug reactions soon after the drugs hit the market. These tools could provide useful signals about adverse effects earlier than currently possible, hastening the FDA's intervention and reducing the impact that an adverse effect can have on public health.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1-ZH-C (01))
Program Officer
Vanbiervliet, Alan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Arizona State University-Tempe Campus
Biomedical Engineering
Schools of Engineering
United States
Zip Code
Sarker, Abeed; Gonzalez, Graciela (2015) Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J Biomed Inform 53:196-207
Emadzadeh, Ehsan; Nikfarjam, Azadeh; Ginn, Rachel E et al. (2014) Unsupervised gene function extraction using semantic vectors. Database (Oxford) 2014:
Nikfarjam, Azadeh; Emadzadeh, Ehsan; Gonzalez, Graciela (2013) Towards generating a patient's timeline: extracting temporal relationships from clinical notes. J Biomed Inform 46 Suppl:S40-7