It is now well recognized that both environmental exposures and genetic susceptibility contribute to the development and progression of cancer. In order to identify individuals at a higher risk for developing cancer or with poor prognosis based on their environmental exposures and genetic profiles and to inform potential environmental modifications or behavioral change interventions that can be implemented to prevent or reduce disease burden, it is essential to understand gene-environment (G 4 E) interactions. While considerable effort has been made to study G4E interactions, existing methods suffer serious limitations, which may mask the detection of genetic effects, lead to inconsistent results across studies, and result in suboptimal predictive models. As such, there is an urgent need for novel methodologies that can effectively analyze data and identify important, reproducible G4E interactions for cancer etiology and survival. In this study, we will develop novel rank-based methods for analyzing G4E interactions in cancer etiology and survival studies. The proposed methods have the much desired robustness and consistency properties not shared by existing methods. They can accommodate the joint effects of a large number of markers, conduct both individual marker-level and pathway-level analyses, and are computationally affordable. We will comprehensively evaluate the proposed methods using simulation studies and compare with existing methods. In addition, we will apply the proposed methods and identify G4E interactions in NHL (non-Hodgkin Lymphoma) etiology and survival. Particularly, we will first analyze the Connecticut study. The findings will be comprehensively evaluated and then validated using the NCI-SEER study.
The specific aims are as follows.
(Aim 1) Develop robust rank-based methods and detect environmental, genetic, and G4E risk factors marginally associated with etiology and survival.
(Aim 2) Develop robust rank- based penalization methods and detect environmental, genetic, and G4E risk factors with important joint effects for etiology and survival.
(Aim 3) Develop user-friendly software and project website.
(Aim 4) Analyze the Connecticut NHL study and identify important G4E interactions. The findings will be comprehensively evaluated and then validated using the NCI-SEER study. The proposed methods will provide a way to more effectively identify G 4 E interactions in the development and prognosis of cancer. They will have superior statistical properties and identify important markers missed by existing methods. The identified markers will provide important insights into the biological mechanisms underlying NHL and serve as basis for future validation studies and clinical practice.

Public Health Relevance

This study will be among the first to systematically develop and implement novel rank-based methods for the analysis of gene-environment interactions in cancer. The proposed methods will enrich the family of analytic approaches for studying gene-environment interactions and cancer genomics. They will be used to identify markers of etiology and survival of non-Hodgkin lymphoma.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Mechanic, Leah E
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code
Wang, Sophia S; Carrington, Mary; Berndt, Sonja I et al. (2018) HLA Class I and II Diversity Contributes to the Etiologic Heterogeneity of Non-Hodgkin Lymphoma Subtypes. Cancer Res 78:4086-4096
Law, Philip J; Berndt, Sonja I; Speedy, Helen E et al. (2017) Genome-wide association analysis implicates dysregulation of immunity genes in chronic lymphocytic leukaemia. Nat Commun 8:14175
Machiela, Mitchell J; Lan, Qing; Slager, Susan L et al. (2016) Genetically predicted longer telomere length is associated with increased risk of B-cell lymphoma subtypes. Hum Mol Genet 25:1663-76
Berndt, Sonja I; Camp, Nicola J; Skibola, Christine F et al. (2016) Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia. Nat Commun 7:10933
Zeng, Xianbin; Ma, Shuangge; Qin, Yichen et al. (2015) Variable selection in strong hierarchical semiparametric models for longitudinal data. Stat Interface 8:355-365
Wu, Cen; Shi, Xingjie; Cui, Yuehua et al. (2015) A penalized robust semiparametric approach for gene-environment interactions. Stat Med 34:4016-30
Zhao, Qing; Shi, Xingjie; Huang, Jian et al. (2015) Integrative Analysis of ""-Omics"" Data Using Penalty Functions. Wiley Interdiscip Rev Comput Stat 7:99-108
Zhao, Qing; Shi, Xingjie; Xie, Yang et al. (2015) Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA. Brief Bioinform 16:291-303
Linet, Martha S; Vajdic, Claire M; Morton, Lindsay M et al. (2014) Medical history, lifestyle, family history, and occupational risk factors for follicular lymphoma: the InterLymph Non-Hodgkin Lymphoma Subtypes Project. J Natl Cancer Inst Monogr 2014:26-40
Morton, Lindsay M; Slager, Susan L; Cerhan, James R et al. (2014) Etiologic heterogeneity among non-Hodgkin lymphoma subtypes: the InterLymph Non-Hodgkin Lymphoma Subtypes Project. J Natl Cancer Inst Monogr 2014:130-44

Showing the most recent 10 out of 42 publications