Considerable effort has been devoted to developing statistical methods for identifying G*E interactions in cancer GWAS studies. The existing methods suffer serious limitations. First, most of them take a model-based approach. The model assumptions are difficult to verify in data analysis, and there is a high risk of model mis- specification, which leads to false marker identification. The existing robust methods have limited applicability. Second, the existing methods adopt ineffective statistical techniques. Recently, we and others introduced effective penalization techniques for identifying important G*E interactions and showed that they significantly outperform the existing techniques. However, the existing penalization methods also have limitations. They adopt an estimation-based marker identification strategy, which is sensitive to tuning parameter selection, lacks stability, and does not have a direct false discovery rate control. In addition, they incur prohibitively high computational cost. The aforementioned limitations can mask the identification of important effects, lead to inconsistent findings across studies, and result in suboptimal predictive models. In this study, we will develop novel methods for detecting G*E interactions in the analysis of cancer etiology, prognosis, and biomarker data. The proposed methods will have the robustness property not shared by the model-based approach. They will adopt novel penalization techniques and advance from the existing penalization methods by adopting and directly comparing multiple marker identification strategies. They will be able to conduct both marginal and joint analyses and both individual marker- and pathway-level analyses. By adopting a progressive approach, they will be computationally affordable with whole-genome data. Specifically, we will (Aim 1) Develop robust penalization methods for identifying important environmental, genetic, and G*E risk factors associated with cancer risk, survival, and biomarker. We will develop effective computational algorithms and rigorously prove the robustness and consistency properties. Extensive simulations and comparisons will be conducted.
(Aim 2) Develop user-friendly software and a project website. We will make the software and other research results easily accessible.
(Aim 3) Analyze data on melanoma and other cancer types and identify important G*E interactions. We will comprehensively evaluate the identified markers and compare with the results obtained using existing methods. This study will deliver a set of novel methods which will have superior statistical and numerical properties and identify important markers missed by existing methods. They will be broadly applicable to a large number of cancer types and to multiple types of genetic, genomic, and epigenetic measurements. In data analysis, the identified markers will provide important insights into the biological mechanisms underlying melanoma and other cancers and serve as basis for future validation studies and clinical practice.

Public Health Relevance

Robust penalization methods will be developed for identifying important gene-environment interactions and genetic effects in the analysis of cancer GWAS data. These methods will have significant advantages over the existing methods. Data on melanoma and other cancer types, which are available at dbGap and TCGA and generated in an ongoing study, will be analyzed.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21CA191383-02
Application #
8990829
Study Section
Special Emphasis Panel (ZRG1-HDM-R (50))
Program Officer
Chen, Huann-Sheng
Project Start
2014-12-24
Project End
2017-12-31
Budget Start
2016-01-01
Budget End
2016-12-31
Support Year
2
Fiscal Year
2016
Total Cost
$144,855
Indirect Cost
$57,855
Name
Yale University
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
043207562
City
New Haven
State
CT
Country
United States
Zip Code
06520
Xu, Yaqing; Wu, Mengyun; Zhang, Qingzhao et al. (2018) Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach. Genomics :
Wu, Cen; Jiang, Yu; Ren, Jie et al. (2018) Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures. Stat Med 37:437-456
Wu, Mengyun; Huang, Jian; Ma, Shuangge (2018) Identifying gene-gene interactions using penalized tensor regression. Stat Med 37:598-610
Yue, Mu; Li, Jialiang; Ma, Shuangge (2018) Sparse boosting for high-dimensional survival data with varying coefficients. Stat Med 37:789-800
Zang, Yangguang; Zhao, Qing; Zhang, Qingzhao et al. (2017) Inferring gene regulatory relationships with a high-dimensional robust approach. Genet Epidemiol 41:437-454
Chai, Hao; Shi, Xingjie; Zhang, Qingzhao et al. (2017) Analysis of cancer gene expression data with an assisted robust marker identification approach. Genet Epidemiol 41:779-789
Zhang, Qingzhao; Duan, Xiaogang; Ma, Shuangge (2017) Focused Information Criterion and Model Averaging with Generalized Rank Regression. Stat Probab Lett 122:11-19
Wu, Mengyun; Zang, Yangguang; Zhang, Sanguo et al. (2017) Accommodating missingness in environmental measurements in gene-environment interaction analysis. Genet Epidemiol 41:523-554
Zhu, Ruoqing; Zhao, Ying-Qi; Chen, Guanhua et al. (2017) Greedy outcome weighted tree learning of optimal personalized treatment rules. Biometrics 73:391-400
Liu, Mengque; Fan, Xinyan; Fang, Kuangnan et al. (2017) Integrative sparse principal component analysis of gene expression data. Genet Epidemiol 41:844-865

Showing the most recent 10 out of 11 publications