The long-term aim of this project is to use natural language processing (NLP) to help realize the full potential of the Electronic Medical Record (EMR). Our research involves advanced NLP techniques to: 1) extract and encode information in textual reports; 2) map terms to an authoritative vocabulary; 3) obtain comprehensive domain coverage based on the processing of domain corpora; and 4) facilitate vocabulary development by providing visualization tools using the Extensible Markup Language (XML). It has already been demonstrated that MedLEE, the NLP system we developed, accurately extracts and codifies information in the EMR. This current project builds upon our experience with MedLEE and uses it to accomplish the latter three goals concerning vocabulary development and standardization. More specifically, MedLEE will be used to map source terms to UMLS concepts. MedLEE will process and structure the source terms and candidate UMLS concepts. Suitable matches will be found based on structural similarity between components of the source term and candidate concepts. This should enhance current methods because knowledge of the type of modifiers that match should improve the quality of the matches. We will also use MedLEE to process a large corpus and generate structured output in XML format. Statistics based on the structured output will be computed, and then clinically relevant composite terms will be detected based on frequencies of the structures containing the more elementary terms. Our method differs from other discovery methods because we use NLP techniques that identify semantic modifiers and complex relations even if the terms are distant from each other, whereas other methods use statistical co-occurrence data based on adjacency. The individual XML structures and statistics will be combined and mapped into a single XML tree. It will be possible to visualize the tree and frequencies using an XML tree viewer, to navigate the tree, to manipulate the tree, and to reorganize the tree according to different axes (i.e., procedure, body location, finding). The use of a sophisticated NLP system, such as MedLEE, is ideal as a foundation for our proposed work in vocabulary development and standardization; medical terminology is an integral part of medical language and a state of the art NLP system is especially equipped to handle the inherent complexities of language.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM006274-05
Application #
6490773
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Florance, Valerie
Project Start
1997-07-01
Project End
2003-12-31
Budget Start
2002-01-01
Budget End
2002-12-31
Support Year
5
Fiscal Year
2002
Total Cost
$288,252
Indirect Cost
Name
Queens College
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
City
Flushing
State
NY
Country
United States
Zip Code
11367
Penz, Janet F E; Wilcox, Adam B; Hurdle, John F (2007) Automated identification of adverse events related to central venous catheters. J Biomed Inform 40:174-82
Melton, Genevieve B; Parsons, Simon; Morrison, Frances P et al. (2006) Inter-patient distance metrics using SNOMED CT defining relationships. J Biomed Inform 39:697-705
Zhou, Li; Tao, Ying; Cimino, James J et al. (2006) Terminology model discovery using natural language processing and visualization techniques. J Biomed Inform 39:626-36
Mendonca, Eneida A; Haas, Janet; Shagina, Lyudmila et al. (2005) Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform 38:314-21
Melton, Genevieve B; Hripcsak, George (2005) Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc 12:448-57
Chen, Lifeng; Friedman, Carol (2004) Extracting phenotypic information from the literature via natural language processing. Medinfo 11:758-62
Xu, Hua; Anderson, Kristin; Grann, Victor R et al. (2004) Facilitating cancer research using natural language processing of pathology reports. Medinfo 11:565-72
Liu, Hongfang; Teller, Virginia; Friedman, Carol (2004) A multi-aspect comparison study of supervised word sense disambiguation. J Am Med Inform Assoc 11:320-31
Tuason, O; Chen, L; Liu, H et al. (2004) Biological nomenclatures: a source of lexical knowledge and ambiguity. Pac Symp Biocomput :238-49
Friedman, Carol; Shagina, Lyudmila; Lussier, Yves et al. (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 11:392-402

Showing the most recent 10 out of 36 publications