We propose a program of research with two interlocking, foundational goals: (1) to develop and evaluate software for information extraction from clinical text corpora using existing Open Biomedical Ontologies (OBO) and (2) to develop and evaluate software for enrichment of existing biomedical ontologies from clinical text corpora. As a result of our work we will deliver the Ontology Development and Information Extraction Toolkit (ODIE) - a set of software components integrated with GATE, Prot?g? and LexGrid, that will assist researchers and ontology developers in performing these tasks. As a testbed for our work, we will focus mainly on the National Cancer Institute Thesaurus - an existing OBO ontology, but will develop many of our components to be generalizable to other OBO ontologies. We have chosen the domain of hematopathology as a test case because of the rich and varied source of clinical documents, and the potential for our software to advance translational biomedical research in this area. However the majority of the components that we develop will be domain-neutral and will generalize to other areas within and outside of Oncology. The work we propose is significant for three contributions. First, we will develop novel methods or modify existing methods for accomplishing information extraction and ontology enrichment and we will evaluate the performance of these alternatives. Second, we will develop and disseminate generic software resources for performing these tasks, which leverage the National Center for Biomedical Ontology supported tools. Third, we will contribute to the development of existing OBO ontologies. The results of this work will use OBO ontologies in fundamental ways to advance biomedicine. This grant propose to develop a set of computer tools to assist researchers in (1) extracting meaning and codifying medical documents, and (2) building formal representations of knowledge from those documents. This work would benefit the general public by increasing the speed and efficiency of determining what information is in a particular medical document and allowing automated processing of large numbers of documents. Additionally, the project would contribute to the software for developing other applications by helping researchers build more comprehensive ontologies. The results of this work may benefit both medical research and patient care.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-E (50))
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Medicine
United States
Zip Code
Liu, K; Mitchell, K J; Chapman, W W et al. (2013) Formative evaluation of ontology learning methods for entity discovery by using existing ontologies as reference standards. Methods Inf Med 52:308-16
Chapman, Wendy W; Savova, Guergana K; Zheng, Jiaping et al. (2012) Anaphoric reference in clinical reports: characteristics of an annotated corpus. J Biomed Inform 45:507-21
Zheng, Jiaping; Chapman, Wendy W; Miller, Timothy A et al. (2012) A system for coreference resolution for the clinical narrative. J Am Med Inform Assoc 19:660-7
Savova, Guergana K; Chapman, Wendy W; Zheng, Jiaping et al. (2011) Anaphoric relations in the clinical narrative: corpus creation. J Am Med Inform Assoc 18:459-65
Liu, Kaihong; Hogan, William R; Crowley, Rebecca S (2011) Natural Language Processing methods and systems for biomedical ontology learning. J Biomed Inform 44:163-79
Chapman, Wendy W; Nadkarni, Prakash M; Hirschman, Lynette et al. (2011) Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 18:540-3
Liu, K; Chapman, W W; Savova, G et al. (2011) Effectiveness of lexico-syntactic pattern matching for ontology enrichment with clinical documents. Methods Inf Med 50:397-407
Zheng, Jiaping; Chapman, Wendy W; Crowley, Rebecca S et al. (2011) Coreference resolution: a review of general methodologies and applications in the clinical domain. J Biomed Inform 44:1113-22