(Taken from application abstract): The long-term aim of this project is to use natural language methods in order to enhance the functionality of the electronic medical record, which is a source of abundant clinical data. However, the data is mostly in textual form and therefore unusable for automated clinical applications, such as decision support, research, quality assurance, and outcomes assessment. By using a natural language processor to map the clinical information in the reports into structured codified clinical data, the data will be made readily accessible so that it could be utilized by subsequent automated clinical applications. We have already shown that it is possible to build an effective text processor that accurately codifies textual reports within the specialized domain of radiology. In this project we intend to build upon our successful experience and will extend the processor to another limited domain that is different from radiology and to a broad domain in order to study the feasibility of transferring the processor to all of medicine. More specifically, we will broaden the processor so that it codifies clinical information in the physical examination section of the discharge summary and then to all of the discharge summary, where we will focus on coding diagnoses. The emphasis of our work will not only be concerned with extending the language processor but will also focus on scalability, evaluation of the performance, the effort, and the portability aspects. In addition, because discharge summaries are so complex and comprehensive, we will have to extend the formal representational model of the clinical information and also develop new natural language processing techniques and new vocabulary development tools. This work will continue to be performed within an operational clinical setting.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Queens College
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Penz, Janet F E; Wilcox, Adam B; Hurdle, John F (2007) Automated identification of adverse events related to central venous catheters. J Biomed Inform 40:174-82
Melton, Genevieve B; Parsons, Simon; Morrison, Frances P et al. (2006) Inter-patient distance metrics using SNOMED CT defining relationships. J Biomed Inform 39:697-705
Zhou, Li; Tao, Ying; Cimino, James J et al. (2006) Terminology model discovery using natural language processing and visualization techniques. J Biomed Inform 39:626-36
Mendonca, Eneida A; Haas, Janet; Shagina, Lyudmila et al. (2005) Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform 38:314-21
Melton, Genevieve B; Hripcsak, George (2005) Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc 12:448-57
Xu, Hua; Anderson, Kristin; Grann, Victor R et al. (2004) Facilitating cancer research using natural language processing of pathology reports. Medinfo 11:565-72
Liu, Hongfang; Teller, Virginia; Friedman, Carol (2004) A multi-aspect comparison study of supervised word sense disambiguation. J Am Med Inform Assoc 11:320-31
Tuason, O; Chen, L; Liu, H et al. (2004) Biological nomenclatures: a source of lexical knowledge and ambiguity. Pac Symp Biocomput :238-49
Friedman, Carol; Shagina, Lyudmila; Lussier, Yves et al. (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 11:392-402
Liu, Hongfang; Friedman, Carol (2004) CliniViewer: a tool for viewing electronic medical records based on natural language processing and XML. Medinfo 11:639-43

Showing the most recent 10 out of 36 publications