Considerable effort has been devoted to developing statistical methods for identifying G*E interactions in cancer GWAS studies. The existing methods suffer serious limitations. First, most of them take a model-based approach. The model assumptions are difficult to verify in data analysis, and there is a high risk of model mis- specification, which leads to false marker identification. The existing robust methods have limited applicability. Second, the existing methods adopt ineffective statistical techniques. Recently, we and others introduced effective penalization techniques for identifying important G*E interactions and showed that they significantly outperform the existing techniques. However, the existing penalization methods also have limitations. They adopt an estimation-based marker identification strategy, which is sensitive to tuning parameter selection, lacks stability, and does not have a direct false discovery rate control. In addition, they incur prohibitively high computational cost. The aforementioned limitations can mask the identification of important effects, lead to inconsistent findings across studies, and result in suboptimal predictive models. In this study, we will develop novel methods for detecting G*E interactions in the analysis of cancer etiology, prognosis, and biomarker data. The proposed methods will have the robustness property not shared by the model-based approach. They will adopt novel penalization techniques and advance from the existing penalization methods by adopting and directly comparing multiple marker identification strategies. They will be able to conduct both marginal and joint analyses and both individual marker- and pathway-level analyses. By adopting a progressive approach, they will be computationally affordable with whole-genome data. Specifically, we will (Aim 1) Develop robust penalization methods for identifying important environmental, genetic, and G*E risk factors associated with cancer risk, survival, and biomarker. We will develop effective computational algorithms and rigorously prove the robustness and consistency properties. Extensive simulations and comparisons will be conducted.
(Aim 2) Develop user-friendly software and a project website. We will make the software and other research results easily accessible.
(Aim 3) Analyze data on melanoma and other cancer types and identify important G*E interactions. We will comprehensively evaluate the identified markers and compare with the results obtained using existing methods. This study will deliver a set of novel methods which will have superior statistical and numerical properties and identify important markers missed by existing methods. They will be broadly applicable to a large number of cancer types and to multiple types of genetic, genomic, and epigenetic measurements. In data analysis, the identified markers will provide important insights into the biological mechanisms underlying melanoma and other cancers and serve as basis for future validation studies and clinical practice.
Robust penalization methods will be developed for identifying important gene-environment interactions and genetic effects in the analysis of cancer GWAS data. These methods will have significant advantages over the existing methods. Data on melanoma and other cancer types, which are available at dbGap and TCGA and generated in an ongoing study, will be analyzed.
Showing the most recent 10 out of 11 publications