Recent developments in molecular biology and cancer epidemiology jointly are making fundamental contributions to the study of etiology, diagnosis, prognosis and treatment of cancers. Case-control studies have been increasingly used for studying the association between different types of cancers and a candidate gene in the last two decades. More recently, many premier cancer and health re- search institutes have undertaken efforts to form global consortium of large case-control genome-wide association studies (GWAS) for various types of cancer. The modest contribution of GWAS findings in terms of explaining cancer risk have again emphasized that the role of environmental factors can- not be ignored in cancer etiology. In the post-GWAS era, many epidemiologic studies are exploring gene-environment interactions (G x E studies). The proposed research considers a variation of the case-control sampling design, namely the two-phase sampling design for G x E studies. The design describes a study setting where a set of inexpensive covariates are available on a larger study base (Phase I sample) and outcome-exposure stratified sampling has been employed to select a sub-sample (Phase II sub-sample). On the Phase II sub-sample, expensive genetic or biomarker data are measured. The goal is to investigate G x E interactions under such sampling designs. The proposed methods lead to efficient use of all available data in Phase I and Phase II through an appropriate two-phase joint retrospective likelihood. More subtle issues like existence of non-monotone missing data in Phase II sub-sample, relaxing the gene-environment independence assumption, variable selection in a multi-gene model are considered. A semiparametric profile likelihood based approach and an alternative semiparametric Bayes approach is proposed for two-phase G x E studies in Specific Aims 1 and 2 respectively.
Specific Aim 1 : Development of semiparametric profile likelihood based estimation strategy for two- phase studies of gene-environment interaction. The proposed estimation strategy can handle non- monotone missing covariate data patterns and addresses the critical issue of relaxing gene-environment independence assumption.
Specific Aim 2 : Development of an alternative semiparametric Bayesian procedure to accomplish the same modeling objectives as in Aim 1. The Bayesian methods would offer more flexibility to handle large number of main effects and interaction terms in the disease risk model and to relax gene-environment independence. The possibility of extending Aim 2 to haplotype-based interactions will be explored. The project team has expertise in biostatistical methodology, cancer epidemiology, human genetics, cancer therapeutics and clinical research. A concrete data example from the Molecular Epidemiology of Colorectal Cancer Study, that examines the evidence of effect modification of the association between colorectal cancer and long-term use of statins by genes in the cholesterol synthesis/lipid metabolism pathway has been identified as a motivating and illustrating example for the proposed methods. However, the methods developed in the application are generic and may be broadly applied to other cancer epidemiology studies that employ outcome-exposure stratified sampling schemes. There are no existing Bayesian approaches for two-phase G x E studies so far. The planned research will also contribute towards filling a gap in the classical frequentist literature on handling non-monotone missing data patterns in two-phase studies. The research will provide valuable clinical insight on the chemoprotective association of statins with colorectal cancer as modified by variation in genotypic information. 1

National Institute of Health (NIH)
National Cancer Institute (NCI)
Small Research Grants (R03)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-SRLB-D (O1))
Program Officer
Dunn, Michelle C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Michigan Ann Arbor
Biostatistics & Other Math Sci
Schools of Public Health
Ann Arbor
United States
Zip Code
Stenzel, Stephanie L; Ahn, Jaeil; Boonstra, Philip S et al. (2015) The impact of exposure-biased sampling designs on detection of gene-environment interactions in case-control studies with potential exposure misclassification. Eur J Epidemiol 30:413-23
Boonstra, Philip S; Bondarenko, Irina; Park, Sung Kyun et al. (2014) Propensity score-based diagnostics for categorical response regression models. Stat Med 33:455-469
Ko, Yi-An; Mukherjee, Bhramar; Smith, Jennifer A et al. (2014) Testing departure from additivity in Tukey's model using shrinkage: application to a longitudinal setting. Stat Med 33:5177-91
Li, Shi; Mukherjee, Bhramar; Taylor, Jeremy M G et al. (2014) The role of environmental heterogeneity in meta-analysis of gene-environment interactions with quantitative traits. Genet Epidemiol 38:416-29
Ahn, Jaeil; Johnson, Timothy D; Bhavnani, Darlene et al. (2014) A space-time point process model for analyzing and predicting case patterns of diarrheal disease in northwestern Ecuador. Spat Spatiotemporal Epidemiol 9:23-35
Boonstra, Philip S; Mukherjee, Bhramar; Taylor, Jeremy Mg (2013) BAYESIAN SHRINKAGE METHODS FOR PARTIALLY OBSERVED DATA WITH MANY PREDICTORS. Ann Appl Stat 7:2272-2292
Ko, Yi-An; Saha-Chaudhuri, Paramita; Park, Sung Kyun et al. (2013) Novel likelihood ratio tests for screening gene-gene and gene-environment interactions with unbalanced repeated-measures data. Genet Epidemiol 37:581-91
Ahn, Jaeil; Mukherjee, Bhramar; Gruber, Stephen B et al. (2013) BAYESIAN SEMIPARAMETRIC ANALYSIS FOR TWO-PHASE STUDIES OF GENE-ENVIRONMENT INTERACTION. Ann Appl Stat 7:543-569
Li, Shi; Mukherjee, Bhramar; Batterman, Stuart et al. (2013) Bayesian analysis of time-series data under case-crossover designs: posterior equivalence and inference. Biometrics 69:925-36
Boonstra, Philip S; Taylor, Jeremy M G; Mukherjee, Bhramar (2013) Incorporating auxiliary information for improved prediction in high-dimensional datasets: an ensemble of shrinkage approaches. Biostatistics 14:259-72

Showing the most recent 10 out of 14 publications