Our long-term goal is to reduce cancer risk by building accurate prediction models for cancer risk and prognosis and developing individualized prevention and treatment strategies based on diverse data including clinical and genomic data. Our immediate goal in the current study is to develop innovative statistical methods to identify genomic features in relation to cancer risk and prognosis with incorporation of biological information including prior-knowledge gene pathways and novel microRNA (miRNA) regulatory network. The underlying rationale for this research is that: 1) high-dimensional data such as genomic biomarkers have been obtained in many research studies and will likely be readily available in practice in the foreseeable future;2) feature selection is imperative in order to buid good prediction models using high-dimensional genomic data;3) incorporating known and novel biological information allows information-borrowing in feature selection, resulting in greater power;and 4) semiparametric methods are more robust to model misspecification than parametric methods that have dominated the literature in feature selection. These considerations lead to four specific aims: 1) develop hierarchical feature selection of high-dimensional biomarkers in semiparametric accelerated failure time (AFT) models for cancer outcomes (e.g., time to cancer recurrence or death) with incorporation of known and novel biological information;2) develop Bayesian feature selection of high- dimensional biomarkers in AFT models for cancer outcomes (e.g., time to cancer recurrence or death) with integrative analysis of the miRNA regulatory network and incorporation of known and novel biological information;3) develop efficient algorithms and user-friendly software with the goal of disseminating them to cancer researchers;and 4) perform systematic evaluation of the proposed methods through extensive numerical studies including simulations and real data analyses. Our proposed methods distinguish themselves from existing approaches in that we use both known and novel biological information to guide feature selection, and we investigate two alternative approaches, semiparametric and fully Bayesian joint-modeling, each of which has its own strengths and weaknesses. Progress on all aims will be guided by and evaluated on motivating prostate cancer and brain tumor data, and by extensive simulation studies. The proposed methods will allow investigators to identify key genomic signatures as well as biological pathways that are predictive of cancer risk and prognosis, leading to potential drug targets and subsequently effective personalized treatments. They promise similar benefits to a wide range of biomedical science settings where similar data and biological information are often encountered.
The immediate goal of this study is to develop innovative statistical methods for feature selection of genomic biomarkers in relation to cancer risk and prognosis with incorporation of known and novel biological information, and apply the methods to motivating prostate cancer and brain tumor data. The proposed statistical methods will allow for identification of key genomic signatures and biological pathways that are predictive of cancer risk and prognosis and subsequent development of individualized cancer prevention and treatment strategies, and promise similar benefits to a wide range of biomedical science settings.
|Pellegrini, Kathryn L; Sanda, Martin G; Patil, Dattatraya et al. (2017) Evaluation of a 24-gene signature for prognosis of metastatic events and prostate cancer-specific mortality. BJU Int 119:961-967|
|Hu, Yi-Juan; Schmidt, Amand F; Dudbridge, Frank et al. (2017) Impact of Selection Bias on Estimation of Subsequent Event Risk. Circ Cardiovasc Genet 10:|
|Safo, Sandra E; Li, Shuzhao; Long, Qi (2017) Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information. Biometrics :|
|Zhao, Yize; Chung, Matthias; Johnson, Brent A et al. (2016) Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence. J Am Stat Assoc 111:1427-1439|
|Wang, Ming; Long, Qi (2016) Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic. Biometrics 72:897-906|
|Torres, Mylin A; Yang, Xiaofeng; Noreen, Samantha et al. (2016) The Impact of Axillary Lymph Node Surgery on Breast Skin Thickening During and After Radiation Therapy for Breast Cancer. Int J Radiat Oncol Biol Phys 95:590-6|
|Long, Qi; Johnson, Brent A (2015) Variable selection in the presence of missing data: resampling and imputation. Biostatistics 16:596-610|
|Tu, Huakang; Sun, Liping; Dong, Xiao et al. (2015) Temporal changes in serum biomarkers and risk for progression of gastric precancerous lesions: a longitudinal study. Int J Cancer 136:425-34|
|Long, Qi; Xu, Jianpeng; Osunkoya, Adeboye O et al. (2014) Global transcriptome analysis of formalin-fixed prostate cancer specimens identifies biomarkers of disease recurrence. Cancer Res 74:3228-37|
|Hsu, Chiu-Hsieh; Long, Qi; Li, Yisheng et al. (2014) A nonparametric multiple imputation approach for data with missing covariate values with application to colorectal adenoma data. J Biopharm Stat 24:634-48|