It has recently become possible to screen many genetic markers across the whole genome for their association with a disease. These genome-wide association studies (GWAS) offer great promise to identify common disease-predisposing variants. The goal of this project is to develop a flexible framework for designing cost-effective GWAS and optimize subsequent replication efforts. For this purpose we will use a general framework for designing optimal multistage studies. In multistage designs all the markers are genotyped and tested in a first stage. Only the promising markers are subsequently genotyped in a second stage using additional samples. Our approach offers three broad advantages. First, because of the large sample sizes that are required to discover disease-predisposing variants while controlling false discoveries, GWAS cost millions of dollars. Compared to single-stage GWAS, optimized multistage designs can achieve the same goals in terms of true and false discoveries with a 50-70% saving in the amount of genotyping. Second, single-stage designs are entirely based on assumptions that may be incorrect potentially leading to goals not being achieved or goals which could have been achieved at much lower costs. Multistage designs, however, offer the possibility to use information collected at the first stage(s) to design optimal follow-up studies. The trend to release GWAS data in the public domain will further increase the practical relevance of this adaptive feature of multistage designs because many research groups are likely to start performing replication studies in their own samples after GWAS data are publicly released. Third, rather than using arbitrary rules (e.g. P-values smaller than 0.05 suggest a replication) our framework will provide statistically motivated decision rules for declaring significance and the subsequent interpretation of what consitues a replication .
Specific aims of our proposal include evaluating and improving the basic framework we already developed. To make the approach applicable across a wide variety of research scenarios, we also propose a wide variety of theoretical and computational extensions. To ensure the utility in practice, we will test our methods on real data. Finally, we plan to make the computer implementation available to a broad spectrum of researchers. Genome-wide association studies offer great promise to identify common disease- predisposing variants. The goal of this project is to develop a flexible framework for designing these studies in a cost-effective way and optimize subsequent replication efforts.
|Aberg, Karolina A; Liu, Youfang; Bukszár, Jozsef et al. (2013) A comprehensive family-based replication study of schizophrenia genes. JAMA Psychiatry 70:573-81|
|van den Oord, Edwin J C G; Bukszar, Jozsef; Rudolf, Gabor et al. (2013) Estimation of CpG coverage in whole methylome next-generation sequencing studies. BMC Bioinformatics 14:50|
|Chen, Wenan; Gao, Guimin; Nerella, Srilaxmi et al. (2013) MethylPCA: a toolkit to control for confounders in methylome-wide association studies. BMC Bioinformatics 14:74|
|Clark, Shaunna L; Adkins, Daniel E; van den Oord, Edwin J C G (2011) Analysis of efficacy and side effects in CATIE demonstrates drug response subgroups and potential for personalized medicine. Schizophr Res 132:114-20|
|McClay, J L; Adkins, D E; Aberg, K et al. (2011) Genome-wide pharmacogenomic analysis of response to treatment with antipsychotics. Mol Psychiatry 16:76-85|
|Adkins, D E; Aberg, K; McClay, J L et al. (2011) Genomewide pharmacogenomic study of metabolic side effects to antipsychotic drugs. Mol Psychiatry 16:321-32|
|Bukszar, Jozsef; van den Oord, Edwin J C G (2010) Estimating effect sizes in genome-wide association studies. Behav Genet 40:394-403|
|Aberg, Karolina; Adkins, Daniel E; Bukszár, József et al. (2010) Genomewide association study of movement-related adverse antipsychotic effects. Biol Psychiatry 67:279-82|
|Bukszar, Jozsef; McClay, Joseph L; van den Oord, Edwin J C G (2009) Estimating the posterior probability that genome-wide association findings are true or false. Bioinformatics 25:1807-13|