Bladder cancer is the sixth most common cancer in the United States, and non-muscle invasive bladder cancer (NMIBC) accounts for 75-80% of all cases. Tumor recurrence and progression are common among NMIBC patients: over 50% of patients have their tumors recur, most within the first year, and up to 45% of high-risk tumors progress to muscle-invasive disease within 5 years. Patients therefore undergo intensive clinical surveillance and treatment, contributing to bladder cancer being the most expensive cancer to treat on a per patient basis. Large population-based studies have been limited in their ability to study tumor recurrence and progression because these key outcomes are not typically captured in cancer registry or other discretely coded data. To overcome this limitation and facilitate future epidemiologic and outcomes studies on NMIBC, we propose to develop and validate automated algorithms using natural language processing (NLP) to capture bladder cancer recurrence (Aim 1) and progression (Aim 2) from free-text pathology, urology, and imaging notes. We will externally validate the accuracy of the algorithms for extracting tumor characteristics using a national sample of 575 patients from the Veterans Affairs (VA) healthcare system (Aim 3). NLP is a powerful tool that works by segmenting notes into units of related text (e.g., sentences) and applying computational methods to determine meaning and extract data. We will use a novel, internally-developed NLP tool that integrates the best components of several open source NLP packages to efficiently develop, refine, and validate the proposed algorithms. Kaiser Permanente Southern California (KPSC) is an ideal study setting because of its large, diverse population, advanced electronic health record, high-quality cancer registry, and complete capture of care. The initial NLP algorithms will be created based on clinical input and chart reviews of a sample of medical records. The algorithms first will be developed using diagnostic reports, leveraging validated cancer registry data on 6,000 patients; the same clinical procedures are used for initial diagnosis as for recurrence / progression. Then, algorithms will be applied to surveillance reports and iteratively refined based on false positive and negative results vs. study chart reviews (n=100 for each iteration). The final algorithms will be compared to an expert reference standard provided by 2 urologic oncologists and a pathologist in a sample of 200 patients. Algorithm performance will be assessed by sensitivity, specificity, positive predictive value, and negative predictive value. The final algorithms will be applied to 4,000 newly diagnosed NMIBC patients age >18 from 2008-2017 within KPSC. The frequency of recurrence and progression will be described, and characteristics of patients with and without the outcomes will be compared. Successful completion of study aims will produce novel, automated methods that will facilitate large epidemiologic and outcomes studies, whose results may improve care for NMIBC patients.

Public Health Relevance

Among non-muscle invasive bladder cancer patients, the key outcomes are tumor recurrence and progression: treatment seeks to reduce recurrence and progression, and surveillance seeks to promptly identify these tumors for treatment. However, these key outcomes are not typically recorded in cancer registry or other discrete data, limiting the ability of large, population-based studies to study them. We therefore propose to develop novel, automated methods to identify bladder cancer recurrence and progression from pathology, urology, and imaging notes to facilitate large, population-based studies of bladder cancer.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21CA227606-01A1
Application #
9668002
Study Section
Cancer, Heart, and Sleep Epidemiology A Study Section (CHSA)
Program Officer
Divi, Rao L
Project Start
2019-02-01
Project End
2021-01-31
Budget Start
2019-02-01
Budget End
2020-01-31
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Kaiser Foundation Research Institute
Department
Type
DUNS #
150829349
City
Oakland
State
CA
Country
United States
Zip Code
94612