(1) Detecting and characterizing haplotype-trait associations.? ? This work has been focusing on improving the characterization of? haplotype associations with traits by incorporating haplotype-specific? variance parameters into the likelihood for genotypic data. The? inference proceeds within the likelihood framework that involves? simultaneous estimation of haplotypic effects and the haplotype? frequencies. The addition of the haplotypic variance is found to? improve power of detecting associations under complex models including? those where only a subset of functional polymorphisms has been scored,? as well as heterogeneity models where multiple mutations are linked to? the haplotypes under study via linkage disequilibrium. Association? tests and estimation procedures have been developed for un-phased? haplotypes, as well as for entire un-phased diplotypes. An overall? association test including all of the haplotypes at once has been? derived as well. The method was successful in finding a strong? association of adrenergic receptor beta-2 (ADRB2) haplotypes with? blood pressure.? ? (2) Effect reversal in association studies.? ? Failure to replicate a genetic association is a common problem. It has? been observed that the direction of the effect in different studies? may be reversed as well. Although an explanation for many of these? cases is likely to be statistical in nature, it has been suggested? that a reversal of effect (flip-flop) can be a consequence of a change? in linkage disequilibrium (LD) between a causal and the observed? variants. A more general model has been developed, showing that a? flip-flop phenomenon can be completely attributed to a change in LD? only in situations when the studied variant is only a proxy marker for? unobserved functional variation. More generally, it has been shown? that a flip-flop can occur without a change in LD, or even when the LD? is zero. Specific conditions has been derived for the form of genetic? effects that allow for such flip-flops. In this model, a flip-flop is? driven by a shift in population haplotype or allele frequencies, even? though both the population prevalence and the allele frequency of the? observed variant can be the same in two populations that exhibit a? flip-flop. If all relevant variants are scored, a flip-flop can no? longer take place, thus it is a consequence of partial knowledge. In? the case of a quantitative trait, the unobserved variants induce a? difference in the variance of the trait among individuals with? different scored alleles. Based on this observation, a statistical? approach has been developed for discovering associations. The approach? is more robust to loss of power due to a genetic flip-flop, compared? to conventional methods.? ? (3) Correlation-based inference for linkage disequilibrium.? ? The correlation between alleles at a pair of genetic loci is a measure? of linkage disequilibrium. The square of the sample correlation? multiplied by sample size provides the usual test statistic for the? hypothesis of no disequilibrium for loci with two alleles and this? relation has proved useful for study design and marker? selection. Nevertheless, this relation holds only in a di-allelic? case, and an extension to multiple alleles has not been made. We? studied a similar statistic, R2, which leads to a correlation-based? test for loci with multiple alleles. One advantage of this approach is? that it can be interpreted as the total correlation between a pair of? loci. When the phase of two-locus genotypes is known, the approach is? equivalent to a novel test for the overall correlation between rows? and columns in a contingency table. In the phase-known case, R2 is the? sum of the squared sample correlations for all 2-by-2 subtables formed? by collapsing to one allele versus the rest at each locus. We examined? the approximate distribution under the null of independence for R2 and? found its close agreement with the exact distribution obtained by? permutations. The test for independence using R2 is a strong? competitor to approaches such as Pearson's chi-square, Fisher's exact? test, and a test based on Cressie and Read's power divergence? statistic. We combine this approach with previously proposed? composite-disequilibrium measures to address the case when the? genotypic phase is unknown. Calculation of the new multi-allele test? statistic and its p-value are very simple, utilizing the approximate? distribution of R2.? ? (4) Combining p-values in large scale genomics experiments.? ? In large-scale genomics experiments involving thousands of statistical? tests, such as association scans and microarray expression? experiments, a key question is: which of the L tests represent true? associations (TAs)? The traditional way to control false findings is? via individual adjustments. In the presence of multiple TAs, p-value? combination methods offer certain advantages. Both Fisher's and? Lancaster's combination methods use an inverse gamma? transformation. We identify the relation of the shape parameter of the? corresponding distribution to the implicit threshold value; p-values? below that threshold are favored by the inverse gamma method (GM). We? explore this feature to improve power over Fisher's method when L is? large and the number of TAs is moderate. However, the improvement in? power provided by combination methods is at the expense of a weaker? claim made upon rejection of the null hypothesis that there are some? TAs among the L tests. Thus, GM remains a global test. To allow a? stronger claim about a subset of p-values that is smaller than L, we? investigate two methods with an explicit truncation: the rank? truncated product method (RTP) that combines the first K ordered? p-values, and the truncated product method (TPM) that combines? p-values that are smaller than a specified threshold. We conclude that? TPM allows claims to be made about subsets of p-values, while the? claim of the RTP is, like GM, more appropriately about all L tests.? GM gives somewhat higher power than TPM, RTP, Fisher, and Simes? methods across a range of simulations.? ? (5) Ranks of a true association in large scale genetics experiments.? ? In the context of a large collection of statistical genetics tests in? which the number of true associations (TAs) is small, we study the? distribution of the ranks of TAs among the false associations? (FAs). We investigate the relative efficiency of ranking measures and? how many best results need to be screened to cover TAs with high? probability, using a few different ways of assessing significance and? adjusting for multiple testing. This way of looking at the problem can? aid in optimally following up on initial significant findings and in? planning of future large scale experiments. Genome-wide expression? studies are one prominent example, where the number of measured? transcription units is in the tens of thousands. Even larger are? whole-genome association scans, where the number of tests, L, is now? commonly in the hundreds of thousands. The measure of association with? a trait of interest could be a p-value, possibly weighted towards? effect size. Under a fairly wide set of conditions, we study rank? distribution of the p-value from a single TA amongst a large number of? FAs. We present the impact of multiple testing adjustments on the? rank distributions. This study identifies situations where ranking? results by the effect size produces better ranks of TAs than the usual? sorting by a test statistic value, or by a p-value.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Intramural Research (Z01)
Project #
1Z01ES101866-03
Application #
7594011
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
3
Fiscal Year
2007
Total Cost
$549,874
Indirect Cost
City
State
Country
United States
Zip Code
Zaykin, Dmitri V; Shibata, Kyoko (2008) Genetic flip-flop without an accompanying change in linkage disequilibrium. Am J Hum Genet 82:794-6;author reply 796-7
Zaykin, Dmitri V; Pudovkin, Alexander; Weir, Bruce S (2008) Correlation-based inference for linkage disequilibrium with multiple alleles. Genetics 180:533-45
Zaykin, Dmitri V; Zhivotovsky, Lev A; Czika, Wendy et al. (2007) Combining p-values in large-scale genomics experiments. Pharm Stat 6:217-26
Warren, L L; Hughes, A R; Lai, E H et al. (2007) Use of pairwise marker combination and recursive partitioning in a pharmacogenetic genome-wide scan. Pharmacogenomics J 7:180-9
Bang, Heejung; Mazumdar, Madhu; Zaykin, Dmitri (2007) A letter to the editor in reply to ""susceptibility to guillain-barre syndrome is associated to polymorphisms of CD1 genes"" by Caporale et al. in the J of Neuroimmunology (2006), 177:112-118. J Neuroimmunol 186:201-2
Zaykin, Dmitri V; Meng, Zhaoling; Ehm, Margaret G (2006) Contrasting linkage-disequilibrium patterns between cases and controls as a novel association-mapping method. Am J Hum Genet 78:737-46
Zaykin, Dmitri V; Young, S Stanley (2005) Large recursive partitioning analysis of complex disease pharmacogenetic studies. II. Statistical considerations. Pharmacogenomics 6:77-89
Zaykin, Dmitri V; Zhivotovsky, Lev A (2005) Ranks of genuine associations in whole-genome scans. Genetics 171:813-23