Feature Selection for Genomic Data Using Known and Novel Biological Information

Long, Qi

Abstract

Our long-term goal is to reduce cancer risk by building accurate prediction models for cancer risk and prognosis and developing individualized prevention and treatment strategies based on diverse data including clinical and genomic data. Our immediate goal in the current study is to develop innovative statistical methods to identify genomic features in relation to cancer risk and prognosis with incorporation of biological information including prior-knowledge gene pathways and novel microRNA (miRNA) regulatory network. The underlying rationale for this research is that: 1) high-dimensional data such as genomic biomarkers have been obtained in many research studies and will likely be readily available in practice in the foreseeable future;2) feature selection is imperative in order to buid good prediction models using high-dimensional genomic data;3) incorporating known and novel biological information allows information-borrowing in feature selection, resulting in greater power;and 4) semiparametric methods are more robust to model misspecification than parametric methods that have dominated the literature in feature selection. These considerations lead to four specific aims: 1) develop hierarchical feature selection of high-dimensional biomarkers in semiparametric accelerated failure time (AFT) models for cancer outcomes (e.g., time to cancer recurrence or death) with incorporation of known and novel biological information;2) develop Bayesian feature selection of high- dimensional biomarkers in AFT models for cancer outcomes (e.g., time to cancer recurrence or death) with integrative analysis of the miRNA regulatory network and incorporation of known and novel biological information;3) develop efficient algorithms and user-friendly software with the goal of disseminating them to cancer researchers;and 4) perform systematic evaluation of the proposed methods through extensive numerical studies including simulations and real data analyses. Our proposed methods distinguish themselves from existing approaches in that we use both known and novel biological information to guide feature selection, and we investigate two alternative approaches, semiparametric and fully Bayesian joint-modeling, each of which has its own strengths and weaknesses. Progress on all aims will be guided by and evaluated on motivating prostate cancer and brain tumor data, and by extensive simulation studies. The proposed methods will allow investigators to identify key genomic signatures as well as biological pathways that are predictive of cancer risk and prognosis, leading to potential drug targets and subsequently effective personalized treatments. They promise similar benefits to a wide range of biomedical science settings where similar data and biological information are often encountered.

Public Health Relevance

The immediate goal of this study is to develop innovative statistical methods for feature selection of genomic biomarkers in relation to cancer risk and prognosis with incorporation of known and novel biological information, and apply the methods to motivating prostate cancer and brain tumor data. The proposed statistical methods will allow for identification of key genomic signatures and biological pathways that are predictive of cancer risk and prognosis and subsequent development of individualized cancer prevention and treatment strategies, and promise similar benefits to a wide range of biomedical science settings.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Small Research Grants (R03)
Project #: 1R03CA183006-01
Application #: 8638532
Study Section: Special Emphasis Panel (ZCA1-SRLB-D (O1))
Program Officer: Dunn, Michelle C

Project Start: 2013-12-01
Project End: 2015-11-30
Budget Start: 2013-12-01
Budget End: 2014-11-30
Support Year: 1
Fiscal Year: 2014
Total Cost: $76,950
Indirect Cost: $26,950

Institution

Name: Emory University
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 066469933

City: Atlanta
State: GA
Country: United States
Zip Code: 30322

Related projects


NIH 2015 R03 CA	Feature Selection for Genomic Data Using Known and Novel Biological Information Long, Qi / Emory University	$69,255
NIH 2014 R03 CA	Feature Selection for Genomic Data Using Known and Novel Biological Information Long, Qi / Emory University	$76,950

Publications

Chang, Changgee; Kundu, Suprateek; Long, Qi (2018) Scalable Bayesian variable selection for structured high-dimensional data. Biometrics :

Safo, Sandra E; Li, Shuzhao; Long, Qi (2018) Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information. Biometrics 74:300-312

Zhao, Yize; Kang, Jian; Long, Qi (2018) Bayesian Multiresolution Variable Selection for Ultra-High Dimensional Neuroimaging Data. IEEE/ACM Trans Comput Biol Bioinform 15:537-550

Pellegrini, Kathryn L; Sanda, Martin G; Patil, Dattatraya et al. (2017) Evaluation of a 24-gene signature for prognosis of metastatic events and prostate cancer-specific mortality. BJU Int 119:961-967

Hu, Yi-Juan; Schmidt, Amand F; Dudbridge, Frank et al. (2017) Impact of Selection Bias on Estimation of Subsequent Event Risk. Circ Cardiovasc Genet 10:

Li, Ziyi; Safo, Sandra E; Long, Qi (2017) Incorporating biological information in sparse principal component analysis with application to genomic data. BMC Bioinformatics 18:332

Deng, Yi; Zhang, Xiaoxi; Long, Qi (2017) Bayesian modeling and prediction of accrual in multi-regional clinical trials. Stat Methods Med Res 26:752-765

Wang, Ming; Long, Qi (2016) Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic. Biometrics 72:897-906

Zhao, Yize; Chung, Matthias; Johnson, Brent A et al. (2016) Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence. J Am Stat Assoc 111:1427-1439

Torres, Mylin A; Yang, Xiaofeng; Noreen, Samantha et al. (2016) The Impact of Axillary Lymph Node Surgery on Breast Skin Thickening During and After Radiation Therapy for Breast Cancer. Int J Radiat Oncol Biol Phys 95:590-6

Showing the most recent 10 out of 14 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: