Our long-term goal is to reduce cancer risk by building accurate prediction models for cancer risk and prognosis and developing individualized prevention and treatment strategies based on diverse data including clinical and genomic data. Our immediate goal in the current study is to develop innovative statistical methods to identify genomic features in relation to cancer risk and prognosis with incorporation of biological information including prior-knowledge gene pathways and novel microRNA (miRNA) regulatory network. The underlying rationale for this research is that: 1) high-dimensional data such as genomic biomarkers have been obtained in many research studies and will likely be readily available in practice in the foreseeable future;2) feature selection is imperative in order to buid good prediction models using high-dimensional genomic data;3) incorporating known and novel biological information allows information-borrowing in feature selection, resulting in greater power;and 4) semiparametric methods are more robust to model misspecification than parametric methods that have dominated the literature in feature selection. These considerations lead to four specific aims: 1) develop hierarchical feature selection of high-dimensional biomarkers in semiparametric accelerated failure time (AFT) models for cancer outcomes (e.g., time to cancer recurrence or death) with incorporation of known and novel biological information;2) develop Bayesian feature selection of high- dimensional biomarkers in AFT models for cancer outcomes (e.g., time to cancer recurrence or death) with integrative analysis of the miRNA regulatory network and incorporation of known and novel biological information;3) develop efficient algorithms and user-friendly software with the goal of disseminating them to cancer researchers;and 4) perform systematic evaluation of the proposed methods through extensive numerical studies including simulations and real data analyses. Our proposed methods distinguish themselves from existing approaches in that we use both known and novel biological information to guide feature selection, and we investigate two alternative approaches, semiparametric and fully Bayesian joint-modeling, each of which has its own strengths and weaknesses. Progress on all aims will be guided by and evaluated on motivating prostate cancer and brain tumor data, and by extensive simulation studies. The proposed methods will allow investigators to identify key genomic signatures as well as biological pathways that are predictive of cancer risk and prognosis, leading to potential drug targets and subsequently effective personalized treatments. They promise similar benefits to a wide range of biomedical science settings where similar data and biological information are often encountered.

Public Health Relevance

The immediate goal of this study is to develop innovative statistical methods for feature selection of genomic biomarkers in relation to cancer risk and prognosis with incorporation of known and novel biological information, and apply the methods to motivating prostate cancer and brain tumor data. The proposed statistical methods will allow for identification of key genomic signatures and biological pathways that are predictive of cancer risk and prognosis and subsequent development of individualized cancer prevention and treatment strategies, and promise similar benefits to a wide range of biomedical science settings.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Small Research Grants (R03)
Project #
1R03CA183006-01
Application #
8638532
Study Section
Special Emphasis Panel (ZCA1-SRLB-D (O1))
Program Officer
Dunn, Michelle C
Project Start
2013-12-01
Project End
2015-11-30
Budget Start
2013-12-01
Budget End
2014-11-30
Support Year
1
Fiscal Year
2014
Total Cost
$76,950
Indirect Cost
$26,950
Name
Emory University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
066469933
City
Atlanta
State
GA
Country
United States
Zip Code
30322
Zhao, Yize; Kang, Jian; Long, Qi (2018) Bayesian Multiresolution Variable Selection for Ultra-High Dimensional Neuroimaging Data. IEEE/ACM Trans Comput Biol Bioinform 15:537-550
Chang, Changgee; Kundu, Suprateek; Long, Qi (2018) Scalable Bayesian variable selection for structured high-dimensional data. Biometrics :
Safo, Sandra E; Li, Shuzhao; Long, Qi (2018) Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information. Biometrics 74:300-312
Pellegrini, Kathryn L; Sanda, Martin G; Patil, Dattatraya et al. (2017) Evaluation of a 24-gene signature for prognosis of metastatic events and prostate cancer-specific mortality. BJU Int 119:961-967
Hu, Yi-Juan; Schmidt, Amand F; Dudbridge, Frank et al. (2017) Impact of Selection Bias on Estimation of Subsequent Event Risk. Circ Cardiovasc Genet 10:
Li, Ziyi; Safo, Sandra E; Long, Qi (2017) Incorporating biological information in sparse principal component analysis with application to genomic data. BMC Bioinformatics 18:332
Deng, Yi; Zhang, Xiaoxi; Long, Qi (2017) Bayesian modeling and prediction of accrual in multi-regional clinical trials. Stat Methods Med Res 26:752-765
Wang, Ming; Long, Qi (2016) Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic. Biometrics 72:897-906
Zhao, Yize; Chung, Matthias; Johnson, Brent A et al. (2016) Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence. J Am Stat Assoc 111:1427-1439
Torres, Mylin A; Yang, Xiaofeng; Noreen, Samantha et al. (2016) The Impact of Axillary Lymph Node Surgery on Breast Skin Thickening During and After Radiation Therapy for Breast Cancer. Int J Radiat Oncol Biol Phys 95:590-6

Showing the most recent 10 out of 14 publications