A key barrier in cancer research is the traditional data access workflow that requires a hypothesis prior to accessing patient data, rather than a workflow that begins with data exploration while protecting privacy. Existing query engines allow researchers to explore clinical data, build queries, and execute queries without the need for the user to understand how the data is stored. However, the interfaces of such query engines have not achieved usability approaching the levels of those for consumer websites due in critical part to the lack of faceted capabilities. Faceted systems for querying clinical data is currently unavailable due to the complexity of data and the mismatch between the ontologies used for organizing and annotating clinical data (such as NCI Thesaurus), and the desired facet structures and properties. We propose to overcome these challenges by developing OncoSphere, a query engine using the NCI Thesaurus as a nested facet system (NFS) to provide web-based exploration of the Kentucky Cancer Registry data using 3 Specific Aims.
In Aim 1 we will develop an approach to transform and implement NCI Thesaurus into an NFS to enable OncoSphere?s interface features.
In Aim 2 we will develop methods to perform quality auditing on the hierarchical structure of the NCI Thesaurus to enhance its quality in supporting faceted query for OncoSphere.
In Aim 3 we will perform evaluation on OncoSphere?s query expressiveness, query performance and conduct preliminary usability assessment. OncoSphere will break new ground in web-based tools and capitalize on available data resources to accelerate cancer research. We expect OncoSphere and its future versions to become an invaluable resource for the cancer research community. The long-term goal of this study is to create data exploration systems for NCI?s Surveillance Epidemiology and End Results (SEER) program and other related cancer data resources through data science innovations to transform user experience with a new generation of data interaction modalities.

Public Health Relevance

The main goal of this project is to develop!OncoSphere, a novel ontology-driven faceted query system to enhance web-based exploration of Kentucky Cancer Registry data. Success of this study will address a fundamental barrier in making query interfaces easier to use, ultimately as easy as shopping on Amazon, to support a broad range of cancer data exploration modalities. Ultimately, this study can lead to the creation of a new generation of tools for querying data in the NCI?s Surveillance Epidemiology and End Results program as well as other related cancer resources.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants (R21)
Project #
7R21CA231904-02
Application #
9949194
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Rivera, Donna
Project Start
2018-06-06
Project End
2020-07-31
Budget Start
2019-08-01
Budget End
2020-07-31
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Texas Health Science Center Houston
Department
Neurology
Type
Schools of Medicine
DUNS #
800771594
City
Houston
State
TX
Country
United States
Zip Code
77030