Lung cancer is one of the most common causes of mortality worldwide. Radiomic features have been shown to provide prognostic values in predicting lung cancer outcomes. Quantitative imaging features, often in dauntingly large numbers, are extracted from tumor regions. However, not all these extracted features are useful for tumor characterization, and feature selection is key for best performance. We plan to develop feasible statistical methods to select relevant features and conduct feature learning, i.e. discovery of representations needed for feature detection from the raw data. On the molecular level, expression and genetic variation of some known genes, such as KDM4 genes, have been linked to lung cancer prognosis, though little is known about epigenetic modifications' roles. Even fewer studies have investigated the impact of the interplay of DNA methylation and coexisting chronic obstructive pulmonary disease (COPD; a major clinical risk factor) on lung cancer risks. Statistically, drawing inference when the predictors (the clinical indicators and the methylation sites) outnumber the sample size in regression settings, e.g. generalized linear models, Cox proportional hazards models and censored quantile regression models, is very challenging. We plan to establish a new framework to draw inferences based on these complicated models. Growing evidence has suggested that cancer can be better understood through mutated or dysregulated pathways or networks rather than individual DNA mutations and mechanism of lung cancer involves the interplay of the cellular heterogeneity, the myriad of dysfunctional molecular and genetic networks. We plan to develop new models to analyze those large scale network/pathway data and investigate how their dynamic network structures can be predicted based on DNA mutations. Leveraging the rich Boston Lung Cancer Survival Cohort database with 11,164 lung cancer cases, we expect that our new statistical methods will help identify novel biomarkers linked to lung cancer. Our promising preliminary results indicate the feasibility of the proposed work, which provides a solid radiomic and molecular basis for prediction of lung cancer outcomes. Core methods will be distributed in open-source, freely available software, naturally leading to implementable procedures for researchers and practitioners.

Public Health Relevance

Leveraging the rich Boston Lung Cancer Survival Cohort (BLCSC) database with 11,164 lung cancer cases, we aim to develop new statistical methods to identify novel biomarkers linked to lung cancer. The BLCSC was the first study that discovered the relevance of EGFR mutations to treatment response in 2004, starting the era of targeted therapy. The findings from the proposal will potentially further impact the medical practice, with our strong collaborative team, rich databases and sound statistical methodologies.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Chen, Huann-Sheng
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Michigan Ann Arbor
Biostatistics & Other Math Sci
Schools of Public Health
Ann Arbor
United States
Zip Code