Despite the success of genome-wide association studies to identify over hundreds of loci that are associated with common and complex diseases, significant challenges remain for statistical inference in these high- dimensional data. Specifically, rare variants generated by emerging genome-wide sequencing studies may explain the "missing heritability", but pose a challenge to the traditional locus-by-locus approach. Studies of gene-environment interactions have not generated many successes, possibly due to limitations of existing analytical methods. Mediation of genetic effects by intermediate outcomes is an emerging topic of interest that may lead to disease prevention or treatment. The existing statistical methods for inferring mediation effect, however, have been underdeveloped. In this proposal, we plan to build novel statistical methods to address these challenges. The methodological research is motivated by, but not limited to, the genome-wide association studies and the sequencing project in the Women's Health Initiative (WHI), including the "Genomics and Randomized Trials Network" (GARNET), "Population Architecture of Genes and Environment" (PAGE) and the "Exome Sequencing Project" (ESP). The feature of this proposal is that the PI and co-investigators are indeed conducting these studies, thus methodological innovations proposed will be applied immediately to address scientific questions of interest. A number of statistical methods for rare variant analysis have been proposed recently. None of the existing methods accounts for the presence of neutral variants, i.e., alleles which do not have functional influence on the trait. Inclusion of neutral variants in the aforementioned gene-set tests certainly dilutes power. In this proposal, we propose a class of finite mixture models that explicitly teases out neutral variants to improve power. The main challenge in identifying gene-environment interactions is lack of power due to limited sample size and typically small magnitude of interactions. Dimension reduction, such as gene-set based inference, is critical to reduce the amount of hypothesis tests and enrich weak genetic effects. We will develop a suite of gene-set based, two-stage filtering procedures for detecting gene-environment interaction. We will also develop a multivariate sparse gene-set testing framework with a L1 penalty to assemble weak genetic effects in a gene or a pathway. The difficulty in inferring mediation of genetic effects on diseases by intermediate outcomes is how to control for unknown confounders. Current approaches exploit "Mendelian Randomization", the random segregation of alleles, and use known genetic risk alleles as instrumental variables to infer causality. Limitations of the existing framework, mainly on overly restrictive assumptions and inability to model the causal effect on binary outcomes, have impeded applicability of such inference. We will revamp the instrumental variable framework originally developed in econometrics to fit better to genetic studies.

Public Health Relevance

The focus of this proposal is to develop novel statistical methods for analysis of high-throughput genotyping and sequencing data, focusing on three outstanding challenges in current genetic epidemiology: rare variants, gene-environment interactions, and mediation by intermediate outcomes. The proposed methods will identify genetic predisposition and environmental exposures that lead to prevention and treatment of common diseases.

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Burwen, Dale R
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Fred Hutchinson Cancer Research Center
United States
Zip Code
Dai, James Y; Li, Shuying S; Gilbert, Peter B (2014) Case-only method for cause-specific hazards models with application to assessing differential vaccine efficacy by viral and host genetics. Biostatistics 15:196-203
Logsdon, Benjamin A; Dai, James Y; Auer, Paul L et al. (2014) A variational Bayes discrete mixture test for rare variant association. Genet Epidemiol 38:21-30
Tapsoba, Jean de Dieu; Kooperberg, Charles; Reiner, Alexander et al. (2014) Robust estimation for secondary trait association in case-control genetic studies. Am J Epidemiol 179:1264-72
Li, Shuying S; Gilbert, Peter B; Tomaras, Georgia D et al. (2014) FCGR2C polymorphisms associate with HIV-1 vaccine protection in RV144 trial. J Clin Invest 124:3879-90
Holmes, Michael V; Dale, Caroline E; Zuccolo, Luisa et al. (2014) Association between alcohol and cardiovascular disease: Mendelian randomisation analysis based on individual participant data. BMJ 349:g4164
Dai, James Y; Chan, Kwun Chuen Gary; Hsu, Li (2014) Testing concordance of instrumental variable effects in generalized linear models with application to Mendelian randomization. Stat Med 33:3986-4007
Pandey, Janardan P; Namboodiri, Aryan M; Bu, Shizhong et al. (2013) Immunoglobulin genes and the acquisition of HIV infection in a randomized trial of recombinant adenovirus HIV vaccine. Virology 441:70-4