The long-term objective of this proposal is to advance patient safety and reduce the cost of medical care by discovering novel adverse drug events (ADEs) through use of automated methods. We will utilize natural language processing (NLP) and data mining methodologies on vast quantities of clinical data in electronic health records (EHRs) to detect novel ADE signals. ADEs are major problems world-wide and cause hospitalizations, deaths, and incur a huge cost to health care. Therefore, continued post-marketing surveillance encompassing large and varied patient populations is crucial for patient safety. EHRs contain a comprehensive amount of clinical information, which if harnessed properly, would be invaluable for pharmacovigilance. We have already demonstrated that we can accurately encode information in clinical reports using the NLP system MedLEE, and that we can accurately detect associations among clinical events using statistical methods that we developed. Therefore, this is an excellent opportunity to continue our research accomplishments and to advance the state of the art in pharmacovigilance. More specifically, MedLEE will be used to map comprehensive clinical information in the EHR to codified data, and then statistical methods will be used to generate an extensive knowledge base of disease-symptom, disease-drug, drug-drug, and drug-symptom associations, which will be used to discover new ADEs. Additionally, we will develop methods to determine the correct sequence of drug, disease, and symptom events, which is critical for detecting ADEs. We will also develop methods to map fine-grained concepts into higher level concepts, which is important for optimizing the statistical methods. The performance of our discovery methods will be evaluated by testing the methods using drugs currently in use with known ADEs, and also by using historical rollback. We will first focus on discovery of short-term events using inpatient records, and then longer-term events using outpatient office visits. This proposal is well positioned to overcome problems associated with existing automated methods based on spontaneous reporting databases and administrative databases. We are confident the methods will be effective because a strong infrastructure is in place for us to build upon. Most importantly, the methodology developed in this proposal presents an excellent chance to dramatically improve patient safety and reduce costs.

Public Health Relevance

This proposal aims to improve patient safety and reduce health care costs by developing effective methods for the discovery of new adverse drug events. The use of natural language processing on vast quantities of EHR records will result in the harnessing of comprehensive clinical information for this purpose, overcoming some of the limitations of current methods that rely on spontaneous reporting and administrative databases.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Internal Medicine/Medicine
Schools of Medicine
New York
United States
Zip Code
Vilar, Santiago; Uriarte, Eugenio; Santana, Lourdes et al. (2014) Similarity-based modeling in large-scale prediction of drug-drug interactions. Nat Protoc 9:2147-63
Li, Ying; Salmasian, Hojjat; Vilar, Santiago et al. (2014) A method for controlling complex confounding effects in the detection of adverse drug reactions using electronic health records. J Am Med Inform Assoc 21:308-14
Freedberg, Daniel E; Salmasian, Hojjat; Friedman, Carol et al. (2013) Proton pump inhibitors and risk for recurrent Clostridium difficile infection among inpatients. Am J Gastroenterol 108:1794-801
Vilar, Santiago; Uriarte, Eugenio; Santana, Lourdes et al. (2013) Detection of drug-drug interactions by modeling interaction profile fingerprints. PLoS One 8:e58321
Liu, X Sherry; Wang, Ji; Zhou, Bin et al. (2013) Fast trabecular bone strength predictions of HR-pQCT and individual trabeculae segmentation-based plate and rod finite element model discriminate postmenopausal vertebral fractures. J Bone Miner Res 28:1666-78
Yadav, Kabir; Sarioglu, Efsun; Smith, Meaghan et al. (2013) Automated outcome classification of emergency department computed tomography imaging reports. Acad Emerg Med 20:848-54
Salmasian, Hojjat; Freedberg, Daniel E; Abrams, Julian A et al. (2013) An automated tool for detecting medication overuse based on the electronic health records. Pharmacoepidemiol Drug Saf 22:183-9
Harpaz, Rave; Vilar, Santiago; Dumouchel, William et al. (2013) Combing signals from spontaneous reports and electronic health records for detection of adverse drug reactions. J Am Med Inform Assoc 20:413-9
Harpaz, R; DuMouchel, W; Shah, N H et al. (2012) Novel data-mining methodologies for adverse drug event discovery and analysis. Clin Pharmacol Ther 91:1010-21
Harpaz, R; Perez, H; Chase, H S et al. (2011) Biclustering of adverse drug events in the FDA's spontaneous reporting system. Clin Pharmacol Ther 89:243-50

Showing the most recent 10 out of 17 publications