Despite the success of genome-wide association studies to identify over hundreds of loci that are associated with common and complex diseases, significant challenges remain for statistical inference in these high- dimensional data. Specifically, rare variants generated by emerging genome-wide sequencing studies may explain the """"""""missing heritability"""""""", but pose a challenge to the traditional locus-by-locus approach. Studies of gene-environment interactions have not generated many successes, possibly due to limitations of existing analytical methods. Mediation of genetic effects by intermediate outcomes is an emerging topic of interest that may lead to disease prevention or treatment. The existing statistical methods for inferring mediation effect, however, have been underdeveloped. In this proposal, we plan to build novel statistical methods to address these challenges. The methodological research is motivated by, but not limited to, the genome-wide association studies and the sequencing project in the Women's Health Initiative (WHI), including the """"""""Genomics and Randomized Trials Network"""""""" (GARNET), """"""""Population Architecture of Genes and Environment"""""""" (PAGE) and the """"""""Exome Sequencing Project"""""""" (ESP). The feature of this proposal is that the PI and co-investigators are indeed conducting these studies, thus methodological innovations proposed will be applied immediately to address scientific questions of interest. A number of statistical methods for rare variant analysis have been proposed recently. None of the existing methods accounts for the presence of neutral variants, i.e., alleles which do not have functional influence on the trait. Inclusion of neutral variants in the aforementioned gene-set tests certainly dilutes power. In this proposal, we propose a class of finite mixture models that explicitly teases out neutral variants to improve power. The main challenge in identifying gene-environment interactions is lack of power due to limited sample size and typically small magnitude of interactions. Dimension reduction, such as gene-set based inference, is critical to reduce the amount of hypothesis tests and enrich weak genetic effects. We will develop a suite of gene-set based, two-stage filtering procedures for detecting gene-environment interaction. We will also develop a multivariate sparse gene-set testing framework with a L1 penalty to assemble weak genetic effects in a gene or a pathway. The difficulty in inferring mediation of genetic effects on diseases by intermediate outcomes is how to control for unknown confounders. Current approaches exploit """"""""Mendelian Randomization"""""""", the random segregation of alleles, and use known genetic risk alleles as instrumental variables to infer causality. Limitations of the existing framework, mainly on overly restrictive assumptions and inability to model the causal effect on binary outcomes, have impeded applicability of such inference. We will revamp the instrumental variable framework originally developed in econometrics to fit better to genetic studies.

Public Health Relevance

The focus of this proposal is to develop novel statistical methods for analysis of high-throughput genotyping and sequencing data, focusing on three outstanding challenges in current genetic epidemiology: rare variants, gene-environment interactions, and mediation by intermediate outcomes. The proposed methods will identify genetic predisposition and environmental exposures that lead to prevention and treatment of common diseases.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Research Project (R01)
Project #
5R01HL114901-03
Application #
8688341
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Burwen, Dale R
Project Start
2012-07-15
Project End
2016-06-30
Budget Start
2014-07-01
Budget End
2015-06-30
Support Year
3
Fiscal Year
2014
Total Cost
Indirect Cost
Name
Fred Hutchinson Cancer Research Center
Department
Type
DUNS #
City
Seattle
State
WA
Country
United States
Zip Code
98109
Cheng, Yichen; Dai, James Y; Paulson, Thomas G et al. (2017) Quantification of Multiple Tumor Clones Using Gene Array and Sequencing Data. Ann Appl Stat 11:967-991
Pashova, Hristina; LeBlanc, Michael; Kooperberg, Charles (2017) Structured detection of interactions with the directed lasso. Stat Biosci 9:676-691
Dai, James Y; Liang, C Jason; LeBlanc, Michael et al. (2017) Case-only approach to identifying markers predicting treatment effects on the relative risk scale. Biometrics :
Dai, James Y; Tapsoba, Jean de Dieu; Buas, Matthew F et al. (2016) Constrained Score Statistics Identify Genetic Variants Interacting with Multiple Risk Factors in Barrett's Esophagus. Am J Hum Genet 99:352-65
Dai, James Y; Zhang, Xinyi Cindy; Wang, Ching-Yun et al. (2016) Augmented case-only designs for randomized clinical trials with failure time endpoints. Biometrics 72:30-8
Cheng, Yichen; Dai, James Y; Kooperberg, Charles (2016) Group association test using a hidden Markov model. Biostatistics 17:221-34
Dai, James Y; Zhang, Xinyi Cindy (2015) Mendelian randomization studies for a continuous exposure under case-control sampling. Am J Epidemiol 181:440-9
Wang, Xiaoyu; Li, Xiaohong; Cheng, Yichen et al. (2015) Copy number alterations detected by whole-exome and whole-genome sequencing of esophageal adenocarcinoma. Hum Genomics 9:22
Dai, James Y; de Dieu Tapsoba, Jean; Buas, Matthew F et al. (2015) A newly identified susceptibility locus near FOXP1 modifies the association of gastroesophageal reflux with Barrett's esophagus. Cancer Epidemiol Biomarkers Prev 24:1739-47
Dai, James Y; Li, Shuying S; Gilbert, Peter B (2014) Case-only method for cause-specific hazards models with application to assessing differential vaccine efficacy by viral and host genetics. Biostatistics 15:196-203

Showing the most recent 10 out of 20 publications