Lung cancer is one of the most common causes of mortality worldwide. Radiomic features have been shown to provide prognostic values in predicting lung cancer outcomes. Quantitative imaging features, often in dauntingly large numbers, are extracted from tumor regions. However, not all these extracted features are useful for tumor characterization, and feature selection is key for best performance. We plan to develop feasible statistical methods to select relevant features and conduct feature learning, i.e. discovery of representations needed for feature detection from the raw data. On the molecular level, expression and genetic variation of some known genes, such as KDM4 genes, have been linked to lung cancer prognosis, though little is known about epigenetic modifications' roles. Even fewer studies have investigated the impact of the interplay of DNA methylation and coexisting chronic obstructive pulmonary disease (COPD; a major clinical risk factor) on lung cancer risks. Statistically, drawing inference when the predictors (the clinical indicators and the methylation sites) outnumber the sample size in regression settings, e.g. generalized linear models, Cox proportional hazards models and censored quantile regression models, is very challenging. We plan to establish a new framework to draw inferences based on these complicated models. Growing evidence has suggested that cancer can be better understood through mutated or dysregulated pathways or networks rather than individual DNA mutations and mechanism of lung cancer involves the interplay of the cellular heterogeneity, the myriad of dysfunctional molecular and genetic networks. We plan to develop new models to analyze those large scale network/pathway data and investigate how their dynamic network structures can be predicted based on DNA mutations. Leveraging the rich Boston Lung Cancer Survival Cohort database with 11,164 lung cancer cases, we expect that our new statistical methods will help identify novel biomarkers linked to lung cancer. Our promising preliminary results indicate the feasibility of the proposed work, which provides a solid radiomic and molecular basis for prediction of lung cancer outcomes. Core methods will be distributed in open-source, freely available software, naturally leading to implementable procedures for researchers and practitioners.

Public Health Relevance

Leveraging the rich Boston Lung Cancer Survival Cohort (BLCSC) database with 11,164 lung cancer cases, we aim to develop new statistical methods to identify novel biomarkers linked to lung cancer. The BLCSC was the first study that discovered the relevance of EGFR mutations to treatment response in 2004, starting the era of targeted therapy. The findings from the proposal will potentially further impact the medical practice, with our strong collaborative team, rich databases and sound statistical methodologies.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
1R01CA249096-01A1
Application #
10119973
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Chen, Huann-Sheng
Project Start
2021-01-01
Project End
2024-12-31
Budget Start
2021-01-01
Budget End
2021-12-31
Support Year
1
Fiscal Year
2021
Total Cost
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109