We propose to develop an automatic change-suggestion (auto-suggestion) approach for quality enhancement of biomedical terminologies. This approach can not only detect errors, but also suggest changes that lead to the identification and fixes of the root causes of errors. Biomedical terminologies provide the basis for data quality in data collection, annotation, management, analysis, sharing, and reuse. They not only serve as a part of the metadata standards for describing data in the FAIR Data Principles (Findable, Accessible, Interoperable, Reusable), but also play a vital role in downstream information systems as a declarative knowledge source. Because of these and additional new roles biomedical terminologies may play, quality issues, if not addressed, can affect the quality of all downstream information systems and tools (including electronic health record, clinical decision support and patient safety evaluation systems). Most existing terminology quality assurance approaches merely indicate the presence of possible quality issues but do not automatically provide suggestion for fixes. The long-term goal of this study is to develop an approach for AutomatiC Error- identification and change-Suggestion (ACES), moving domain expert and ontology engineer's effort to validating suggested changes, rather than creating changes. To advance this goal, we propose three specific aims:
Aim 1. To develop an auto-suggestion reasoning framework for automatic error detection in non- lattice subgraphs by performing Formal Concept Analysis (FCA) on logical definitions of concepts. The constructed FCA-lattices will serve as logically meaningful reference structures for comparison with the original non-lattice subgraphs to automatically reveal potential errors as well as suggest remedies.
Aim 2. To develop an automated method to uncover root causes of errors in logical definitions of concepts and suggest remedial changes in the definitions for evaluation. We will develop a reasoning algorithm to automate the process of locating erroneous or incomplete logical definitions that lead to the potential errors. Working with domain experts, we will evaluate randomly selected auto-suggestions using our web-based system to assess the effectiveness of our error detection and root-cause analysis methods.
Aim 3. To quantitatively assess the terminology quality impact on queries over healthcare data for patient cohort identification. We will leverage SNOMED CT and a comprehensive EHR database Cerner Health Facts to measure the global impact of missing is-a relations and incorrect is-a relations on performing clinical queries over the EHR database (missing is-a relations reduce recalls of queries, and incorrect is-a relations reduce the precisions of queries). Our utilization of non-lattice subgraphs is based on a rigorous mathematical theory, which suggests that the hierarchical relation between ontological concepts should structurally conform to the mathematical property of being a lattice. Therefore, ACES is generalizable to virtually all biomedical terminologies, and the expected impact is high.

Public Health Relevance

The main goal of this project is to develop a general automatic change-suggestion (auto-suggestion) framework to systematically address quality issues in a broad range of biomedical terminologies and automatically propose change suggestions. This auto-suggestion framework has the potential to uncover de novo patterns for quality enhancement of biomedical terminologies. The quality enhancement of biomedical terminologies will in turn impact downstream clinical decision support, information retrieval, and terminology- based systems including clinical queries over electronic health records.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
1R01LM013335-01
Application #
9940031
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Vanbiervliet, Alan
Project Start
2020-08-01
Project End
2022-07-31
Budget Start
2020-08-01
Budget End
2021-07-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Texas Health Science Center Houston
Department
Type
Sch Allied Health Professions
DUNS #
800771594
City
Houston
State
TX
Country
United States
Zip Code
77030