Medication-related morbidity and mortality in ambulatory care in the United States results in estimated 100,000 deaths and $177 billion spending annually. Post-marketing passive surveillance of outcomes associated with medication use has been recognized as a necessary component in drug safety monitoring to overcome the limitations of pre- marketing clinical trials. Information technology applied to the patient's electronic medical and therapeutic record holds promise to improve this situation by detecting alarming trends in signs and symptoms in patient populations exposed to the same medication. Currently, much of the information necessary for active drug safety surveillance is """"""""locked"""""""" in the unstructured text of electronic records. Our long-term goal is to develop information technology to recognize and prevent drug therapy related adverse events. Sophisticated natural language processing systems have been developed to find medical terms and their synonyms in the unstructured text and use them to retrieve information. In order to monitor alarming trends in symptoms in medical records, we need mechanisms that will allow not only accurate term and concept identification but also grouping of semantically related concepts that may not necessarily be synonymous. Measures of semantic relatedness rely on existing ontologies of domain knowledge as well as large textual corpora to compute a numeric score indicating the strength of relatedness between two concepts. Our central hypothesis is that such measures will be able to make fine-grained distinctions among concepts in the biomedical text, and provide a foundation upon which to organize concepts into meaningful groups automatically. In particular, this proposal seeks to develop methods that leverage the medical knowledge contained within Unified Medical Language System (UMLS) and corpora of clinical text. Our short-term goals are 1) develop new methods, specific to clinical text, for computing semantic relatedness 2) integrate these specific methods for computing semantic relatedness into more general methods of natural language processing 3) integrate semantic relatedness into methods for identifying labeled semantic relations in clinical text. Labeled relations significantly enhance the ability of natural language processing to support accurate automatic analysis of medical information for improving patient safety. Our next step will be to develop and validate a generalizable active medication safety surveillance system that will automatically track medication exposure and alarming trends in signs and symptoms in ambulatory and hospitalized populations for a broad range of diseases.

Public Health Relevance

This project will a) create and validate a common open-source platform for developing and testing semantic relatedness measures, b) determine the validity of electronic medical records with respect to identification of symptoms associated with medication- related problems and c) develop a novel methodology to aggregate adverse reaction terms used to code spontaneous post-marketing drug safety surveillance reports. The results of this project will enable more effective medication safety surveillance efforts and thus will improve patient safety.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Minnesota Twin Cities
Public Health & Prev Medicine
Schools of Pharmacy
United States
Zip Code
Pakhomov, Serguei V S; Finley, Greg; McEwan, Reed et al. (2016) Corpus domain effects on distributional semantic modeling of medical terms. Bioinformatics 32:3635-3644
Pakhomov, Serguei V S; Jones, David T; Knopman, David S (2015) Language networks associated with computerized semantic indices. Neuroimage 104:125-37
Moon, Sungrim; McInnes, Bridget; Melton, Genevieve B (2015) Challenges and practical approaches with word sense disambiguation of acronyms and abbreviations in the clinical domain. Healthc Inform Res 21:35-42
Wang, Yan; Pakhomov, Serguei; Ryan, James O et al. (2015) Domain adaption of parsing for operative notes. J Biomed Inform 54:1-9
Pakhomov, Serguei V S; Hemmy, Laura S (2014) A computational linguistic measure of clustering behavior on semantic verbal fluency task predicts risk of future dementia in the nun study. Cortex 55:97-106
McInnes, Bridget T; Pedersen, Ted; Liu, Ying et al. (2014) U-path: An undirected path-based measure of semantic similarity. AMIA Annu Symp Proc 2014:882-91
Zhang, Rui; Pakhomov, Serguei; Melton, Genevieve B (2014) Longitudinal analysis of new information types in clinical notes. AMIA Jt Summits Transl Sci Proc 2014:232-7
Moon, Sungrim; Pakhomov, Serguei; Liu, Nathan et al. (2014) A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources. J Am Med Inform Assoc 21:299-307
Zhang, Rui; Pakhomov, Serguei; Lee, Janet T et al. (2013) Navigating longitudinal clinical notes with an automated method for detecting new information. Stud Health Technol Inform 192:754-8
McInnes, Bridget T; Pedersen, Ted (2013) Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. J Biomed Inform 46:1116-24

Showing the most recent 10 out of 24 publications