Identification of risk factors for sporadic colorectal cancer (CRC) and creation of a prediction model for it will help target high-risk persons for early screening. Such targeting may reduce morbidity and mortality from this particularly devastating disease in this very vital age group, an without the need to apply screening broadly to a population where non-targeted screening is likely to cause more harm than good. Screening for CRC is recommended for average-risk persons aged 50 years and older. However, 7-11% of all CRC occurs in persons < 50, most of whom have no classic risk factors at the time of diagnosis. These persons are not only younger, but often present with more advanced disease and have a less favorable prognosis than older persons. During the last 20 years, the incidence of CRC, while falling in persons 50 years old and older, has risen steadily in persons under age 50. For these reasons, it is critically importan to try to identify among Veterans (who are already a high-risk group), those < age 50 at high-risk for CRC, who may be candidates for early screening. From a practical perspective, an efficient way to identify Veterans using electronic medical record (EMR) data would facilitate implementation. Project objectives include: 1) Identify risk factors for sporadic (i.e., non-hereditary) CRC in persons < age 50; 2) Derive and validate a prediction model for quantifying absolute and relative risks for CRC; 3) Compare the accuracy of automated data abstraction using natural language processing for identifying and abstracting risk factor information from VA electronic health information to the gold standard of manual electronic medical record review. Using the VA Central Cancer registry, we will identify incident cases of CRC diagnosed between 2008 and 2014. We will verify case eligibility from manual review of CPRS, excluding those with inflammatory bowel disease, a high-risk family history, polyposis syndrome, or hereditary non-polyposis colon cancer syndrome. Using medical SAS datasets, we will match each final case to 4 controls during the same time period and validate the control group by using a second control group with a negative (i.e., no neoplasia) diagnostic colonoscopy. The same exclusions will apply to controls, along with previous colectomy of any extent and for any reason. Cases and controls will be matched for facility. Manual review of EMR in VistAweb will be conducted by trained research personnel, who will identify information about candidate risk factors of lifestyle habits (cigarette and ethanol use, occupation, leisure activity / exercise), family cancer history, BMI, socio-demographic features, certain laboratory test results, prior CRC screening test results, and medication use. Logistic regression will be used to identify independent factors associated with CRC. A prediction model will be derived and internally validated. Age- and gender- specific SEER CRC incidence rates will be used in conjunction with the prediction model to provide estimates of absolute and relative CRC risks (or colon age). Depending on the magnitude of the absolute risk and how it compares with SEER population risks, CRC screening using some screening modality may be considered. From a methodological perspective, we will create a natural language processing tool and use it to perform automated identification and abstraction on the EMRs of cases and controls, comparing its capture of information to that of manual EMR review.

Public Health Relevance

Screening for colorectal cancer (CRC) in adults aged 50 years and older has reduced both new cases of and deaths due to CRC. However, among persons under age 50, sporadic (i.e., non-hereditary) CRC is on the rise. Lowering the age for starting to screen has been proposed, but limited information suggests that the harms and costs would outweigh the benefits. Another option is to identify a high-risk group among persons under age 50. This project will use electronic VA database and electronic medical record (EMR) review to compare cases (persons with non-hereditary CRC diagnosed prior to age 50) and controls (no CRC) on several candidate risk factors with the goal of creating a tool that identifies younger Veterans at high-risk who could be screened earlier than age 50. We will also compare human EMR review with automated ways of obtaining risk information from the EMR and other electronic databases to determine whether automated methods may be used to identify younger Veterans at high-risk for CRC.

Agency
National Institute of Health (NIH)
Institute
Veterans Affairs (VA)
Type
Non-HHS Research Projects (I01)
Project #
5I01HX001650-02
Application #
9927917
Study Section
HSR-1 Medical Care and Clinical Management (HSR1)
Project Start
2016-06-01
Project End
2020-03-31
Budget Start
2017-06-01
Budget End
2018-05-31
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Rlr VA Medical Center
Department
Type
DUNS #
608434697
City
Indianapolis
State
IN
Country
United States
Zip Code
46202