A family-based framework of quality assurance for biomedical ontologies

Perl, Yehoshua

Abstract

We will develop a family-based Quality Assurance (QA) framework for biomedical ontologies. Ontology QA is critical for increasing the use of ontologies in interdisciplinary research and in electronic health records (EHRs). We will develop computational techniques for identifying concepts with high probability of errors to improve efficiency and effectiveness of ontology QA. Biomedical ontologies are large, complex knowledge representation systems that enable the integration of knowledge from different fields. The largest, best-known ontology repository is the Bioportal of the National Center for Biomedical Ontologies, containing more than 300 ontologies and tools for editing, browsing, and visualizing these ontologies. However, many errors have been discovered in BioPortal's ontologies. QA in BioPortal has been mostly focused on use-cases and ad hoc techniques. Our computational techniques will automatically identify sets of concepts with a high likelihood of errors to empower ontology QA. In past research, we have designed many QA techniques for single ontologies and have shown that sets of complex and uncommonly classified concepts have significantly higher percentages of errors. The theoretical bases for our QA are Abstraction Networks (AbNs), which summarize ontologies in a compact way. Using AbNs, we identified many error-prone concepts. In this project, we will perform QA for whole families of ontologies. We have already identified seven preliminary families, based on structural properties. If a classification of concepts yields higher than usual error rates in several ontologies of a family F then we hypothesize that this will be true for such classifications for most ontologies of F. We will build a prototype software tool (BLUOWL) for determining AbNs for each family, to support QA of its ontologies. Our primary test beds will be seven cancer-related ontologies, e.g., the National Cancer Institute thesaurus (NCIt), with different properties and purposes. Some non-cancer ontologies will also be included. We have published preliminary QA results for four such ontologies. In evaluation studies, we will formulate and test hypotheses, statistically expressing the error expectations for various kinds of concepts. Ontologies' curators were recruited to review the suspicious concepts we will identify as part of their regular QA efforts (outside of our budget). In summary, we will: Identify families of BioPortal ontologies based on ontology structure and design a unified methodology for deriving their abstraction networks; Build a software tool (BLUOWL) for QA of each family; Investigate concept classifications more likely to be erroneous in each family; Perform evaluation of our QA methodologies and usability studies for BLUOWL.

Public Health Relevance

Biomedical ontologies are critical for interdisciplinary research and electronic health records (EHRs). The largest, best-known ontology repository is the Bioportal of the National Center for Biomedical Ontologies, containing more than 300 ontologies. However, many errors have been discovered in BioPortal's ontologies. Quality Assurance (QA) in BioPortal has been mostly focused on use-cases and ad hoc techniques. We will develop a systematic, family-based framework for QA of biomedical ontologies. The theoretical basis for our QA methods is constituted by Abstraction Networks, which summarize ontologies in a compact way. The Abstraction Networks will support the detection of sets of concepts with a high likelihood of errors, which will improve the yield of the QA activities. A prototype software tool (BLUOWL) implementing our QA theory will be built.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project (R01)
Project #: 5R01CA190779-02
Application #: 9027817
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Moser, Richard

Project Start: 2015-03-04
Project End: 2018-02-28
Budget Start: 2016-03-01
Budget End: 2017-02-28
Support Year: 2
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: Rutgers University
Department: Biostatistics & Other Math Sci
Type: Other Specialized Schools
DUNS #: 075162990

City: Newark
State: NJ
Country: United States
Zip Code

Related projects


NIH 2017 R01 CA	A family-based framework of quality assurance for biomedical ontologies Perl, Yehoshua / Rutgers University	$500,182
NIH 2016 R01 CA	A family-based framework of quality assurance for biomedical ontologies Perl, Yehoshua / Rutgers University
NIH 2015 R01 CA	A family-based framework of quality assurance for biomedical ontologies Perl, Yehoshua / Rutgers University	$598,037

Publications

Leiter, Amanda; Bickell, Nina A; LeRoith, Derek et al. (2018) Statin Use and Breast Cancer Prognosis in Black and White Women. Horm Cancer 9:55-61

Zheng, Ling; Yumak, Hasan; Chen, Ling et al. (2017) Quality assurance of chemical ingredient classification for the National Drug File - Reference Terminology. J Biomed Inform 73:30-42

Perl, Yehoshua; Geller, James; Halper, Michael et al. (2017) Introducing the Big Knowledge to Use (BK2U) challenge. Ann N Y Acad Sci 1387:12-24

Elhanan, Gai; Ochs, Christopher; Mejino Jr, Jose L V et al. (2017) From SNOMED CT to Uberon: Transferability of evaluation methodology between similarly structured ontologies. Artif Intell Med 79:9-14

Ochs, Christopher; Case, James T; Perl, Yehoshua (2017) Analyzing structural changes in SNOMED CT's Bacterial infectious diseases using a visual semantic delta. J Biomed Inform 67:101-116

He, Zhe; Chen, Yan; Geller, James (2017) Perceiving the Usefulness of the National Cancer Institute Metathesaurus for Enriching NCIt with Topological Patterns. Stud Health Technol Inform 245:863-867

Halper, Michael; Perl, Yehoshua; Ochs, Christopher et al. (2017) Taxonomy-Based Approaches to Quality Assurance of Ontologies. J Healthc Eng 2017:3495723

Gallagher, E J; Zelenko, Z; Neel, B A et al. (2017) Elevated tumor LDLR expression accelerates LDL cholesterol-mediated breast cancer growth in mouse models of hyperlipidemia. Oncogene 36:6462-6471

Ochs, Christopher; Perl, Yehoshua; Geller, James et al. (2017) An empirical analysis of ontology reuse in BioPortal. J Biomed Inform 71:165-177

Ochs, Christopher; He, Zhe; Zheng, Ling et al. (2016) Utilizing a structural meta-ontology for family-based quality assurance of the BioPortal ontologies. J Biomed Inform 61:63-76

Showing the most recent 10 out of 21 publications

Comments

Be the first to comment on Yehoshua Perl's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: