The engineering of ontologies that define the entities in an application area and the relationships among them has become essential for modern work in biomedicine. Ontologies help both humans and computers to manage burgeoning numbers of data. The need to annotate, retrieve, and integrate high- throughput data sets, to process natural language, and to build systems for decision support has set many communities of investigators to work building large ontologies. The Protg system has become an indispensable open-source resource for an enormous international community of scientists?supporting the development, maintenance, and use of ontologies and electronic knowledge bases by biomedical investigators everywhere. The number of registered Protg users has grown from 3,500 in 2002 to more than 300,000 users as of this writing. The widespread use of ontologies in biomedicine and the availability of tools, such as Protg, have taken the biomedical field forward to a new set of challenges that current technology has not been designed to address: Biomedical ontologies have grown in size and scope, and their creation, maintenance and quality assurance have become particularly effort-intensive and error-prone. In this proposal, we will develop new methods and tools that will significantly aid biomedical researchers in easily creating and testing biomedical ontologies throughout their lifecycle. Our plan entails four specific aims. First, we will develop methods and tools to allow biomedical scientist to easily create ontologies directly from their source documents, such as spreadsheets, tab indented hierarchies, and document outlines. Second, we will provide the methods and tools to allow biomedical scientist to identify potential ?hot spots? in their ontologies that might affect their quality. Third, we will implement a comprehensive, automated testing framework for ontologies that will assist biomedical researchers in performing ontology and data quality assurance throughout the development cycle. Fourth, we will continue to expand and support the thriving Protg user community, as it grows to include new clinicians and biomedical scientists as they build the ontologies needed to support clinical care, data-driven research, and the elucidation of new discoveries.

Public Health Relevance

Protg is a software system that helps a burgeoning user community to develop ontologies that enhance biomedical research and improve patient care. Protg supports scientists, clinician researchers, and workers in informatics in data annotation, data integration, information retrieval, natural-language processing, electronic patient record systems, and decision-support systems. The Protg resource provides critical semantic-technology infrastructure and expertise for biomedical research and the development of advanced clinical information systems.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
Kamdar, Maulik R; Walk, Simon; Tudorache, Tania et al. (2018) Analyzing user interactions with biomedical ontologies: A visual perspective. Web Semant 49:16-30
Lou, Yun; Tu, Samson W; Nyulas, Csongor et al. (2017) Use of ontology structure and Bayesian models to aid the crowdsourcing of ICD-11 sanctioning rules. J Biomed Inform 68:20-34
Gonçalves, Rafael S; Tu, Samson W; Nyulas, Csongor I et al. (2017) An ontology-driven tool for structured data acquisition using Web forms. J Biomed Semantics 8:26
Ochs, Christopher; Perl, Yehoshua; Geller, James et al. (2017) An empirical analysis of ontology reuse in BioPortal. J Biomed Inform 71:165-177