Unlocking Data From Medical Records with Text Processing

Whitehead, Jennifer

Abstract

The long-term aim of this project is to use natural language processing (NLP) to help realize the full potential of the Electronic Medical Record (EMR). Our research involves advanced NLP techniques to: 1) extract and encode information in textual reports; 2) map terms to an authoritative vocabulary; 3) obtain comprehensive domain coverage based on the processing of domain corpora; and 4) facilitate vocabulary development by providing visualization tools using the Extensible Markup Language (XML). It has already been demonstrated that MedLEE, the NLP system we developed, accurately extracts and codifies information in the EMR. This current project builds upon our experience with MedLEE and uses it to accomplish the latter three goals concerning vocabulary development and standardization. More specifically, MedLEE will be used to map source terms to UMLS concepts. MedLEE will process and structure the source terms and candidate UMLS concepts. Suitable matches will be found based on structural similarity between components of the source term and candidate concepts. This should enhance current methods because knowledge of the type of modifiers that match should improve the quality of the matches. We will also use MedLEE to process a large corpus and generate structured output in XML format. Statistics based on the structured output will be computed, and then clinically relevant composite terms will be detected based on frequencies of the structures containing the more elementary terms. Our method differs from other discovery methods because we use NLP techniques that identify semantic modifiers and complex relations even if the terms are distant from each other, whereas other methods use statistical co-occurrence data based on adjacency. The individual XML structures and statistics will be combined and mapped into a single XML tree. It will be possible to visualize the tree and frequencies using an XML tree viewer, to navigate the tree, to manipulate the tree, and to reorganize the tree according to different axes (i.e., procedure, body location, finding). The use of a sophisticated NLP system, such as MedLEE, is ideal as a foundation for our proposed work in vocabulary development and standardization; medical terminology is an integral part of medical language and a state of the art NLP system is especially equipped to handle the inherent complexities of language.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM006274-05
Application #: 6490773
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Florance, Valerie

Project Start: 1997-07-01
Project End: 2003-12-31
Budget Start: 2002-01-01
Budget End: 2002-12-31
Support Year: 5
Fiscal Year: 2002
Total Cost: $288,252
Indirect Cost

Institution

Name: Queens College
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #

City: Flushing
State: NY
Country: United States
Zip Code: 11367

Related projects


NIH 2003 R01 LM	Unlocking Data From Medical Records with Text Processing Friedman, Carol / Columbia University (N.Y.)	$280,840
NIH 2002 R01 LM	Unlocking Data From Medical Records with Text Processing Whitehead, Jennifer / Queens College	$288,252
NIH 2001 R01 LM	Unlocking Data From Medical Records with Text Processing Friedman, Carol / Queens College	$303,860
NIH 1999 R01 LM	Unlocking Data From Medical Records with Text Processing Friedman, Carol / Queens College
NIH 1998 R01 LM	Unlocking Data From Medical Records with Text Processing Friedman, Carol / Queens College
NIH 1997 R01 LM	Unlocking Data From Medical Records with Text Processing Friedman, Carol / Queens College

Publications

Penz, Janet F E; Wilcox, Adam B; Hurdle, John F (2007) Automated identification of adverse events related to central venous catheters. J Biomed Inform 40:174-82

Melton, Genevieve B; Parsons, Simon; Morrison, Frances P et al. (2006) Inter-patient distance metrics using SNOMED CT defining relationships. J Biomed Inform 39:697-705

Zhou, Li; Tao, Ying; Cimino, James J et al. (2006) Terminology model discovery using natural language processing and visualization techniques. J Biomed Inform 39:626-36

Mendonca, Eneida A; Haas, Janet; Shagina, Lyudmila et al. (2005) Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform 38:314-21

Melton, Genevieve B; Hripcsak, George (2005) Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc 12:448-57

Chen, Lifeng; Friedman, Carol (2004) Extracting phenotypic information from the literature via natural language processing. Medinfo 11:758-62

Xu, Hua; Anderson, Kristin; Grann, Victor R et al. (2004) Facilitating cancer research using natural language processing of pathology reports. Medinfo 11:565-72

Liu, Hongfang; Teller, Virginia; Friedman, Carol (2004) A multi-aspect comparison study of supervised word sense disambiguation. J Am Med Inform Assoc 11:320-31

Tuason, O; Chen, L; Liu, H et al. (2004) Biological nomenclatures: a source of lexical knowledge and ambiguity. Pac Symp Biocomput :238-49

Friedman, Carol; Shagina, Lyudmila; Lussier, Yves et al. (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 11:392-402

Showing the most recent 10 out of 36 publications

Comments

Be the first to comment on Jennifer Whitehead's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: