(1) Detecting haplotype associations? ? This research focuses of statistical genetics problems related to association mapping. In particular, we continue to work on developing haplotype-based methods as well as other multilocus methods for detecting and characterizing genetic associations with discrete and quantitative traits. There are genetic models that lead to higher power for methods utilizing haplotype information, compared to single locus approaches. Regardless of the potential power advantages, estimation and testing of haplotypic and multilocus effects will continue to be important from the biological point of view. Some of this research is concerned with the association problem under the scenario that genetic loci are scored without the knowledge of the parental gametes, that is, when the haplotype phase is unknown. It is nevertheless possible to estimate haplotype frequencies with the corresponding effects, and this area of statistical genetics research continues to be very active. More specifically, we have been concerned with a problem of detecting haplotype associations when the important causative factors (such as environmental factors) may not have been ascertained. We considered the problem of how unknown environmental and genetic factors influence the distribution of the trait of interest among individuals that carry different classes of haplotypes or diplotypes and suggested how haplotype or diplotype-specific parameters can be adjusted to improve power under such models. These unknown factors that we consider might only influence the trait mean value, yet haplotype and diplotype-specific variances can be modified by the effect of the unknown factor. When the unknown factor is genetic, the scored (known) haplotypes can be linked to the unknown factor via linkage disequilibrium (LD), in which case we say that they are proxies for the unknown functional variation. They could also be partially functional, that is when the underlying effects are represented by joint effects of the known and the unknown factor levels. In the later case, the joint effects can be with or without interactions. In all of these cases, we find that trait variances among the observed haplotypes may be modified due to the additional factor that are unaccounted for by the analysis. Such haplotypic variance contrast between haplotypes or diplotypes can be estimated, and have been incorporated into tests for association for data with unknown haplotype phase. The approach provides additional power, especially under epistatic and heterogeneity models. A pronounced variance contrast may indicate such involvement of additional factors that influence the trait. Any given population can assume joint frequencies (of the known and the unknown factors) that could result in the marginal effect at the observed locus to be either negative, or positive, or anywhere half-way, that is close to zero. Consequently, the power to detect an association can be small and unpredictable. One can express a population measure of association, such as the relative risk, in terms of the joint effects and their frequencies and solve for frequency configurations that result in the absence of effect at the observed locus. We found however, that a variety of genetic models, particularly those that involve an interaction between A and B, result in a substantial variance contrast at the alleles (haplotypes) of the observed locus. If sample is taken from a population with such frequencies that the marginal effect at the observed locus is small, incorporating the variance parameters into a test can improve power to detect association. One can start with testing the mean and the variance contrasts simultaneously, then proceed with the mean- and the variance-specific tests. A substantial variance contrast might be considered as an indication of an involvement of important factors unaccounted for by the study.? ? We proposed and compared several variations of these tests within the likelihood ratio framework. The tests included into comparisons differed in several respects (i) the variances can be estimated in different ways, and allowed to be the same or different while comparing the means; (ii) the overall test that includes all haplotypes can be based on summing up contributions across haplotypes, alternatively, the overall statistic value can be driven by haplotypes with the largest contributions; (iii) the models of analysis can be haplotype, or diplotype-based, with diplotype-based variations allowing for co-dominant, dominant and recessive modes of analysis.? ? (2) Composite linkage disequilibrium correlation for multiple alleles.? ? Several earlier studies found good performance and robustness of the composite LD measures. Schaid (2004) extended the composite LD testing to the case of multiple alleles, while explicitly incorporating Hardy-Weinberg equilibrium (HWE) deviations into the variance of the test statistic. An earlier method by Weir (Biometrics 1979) was based on a sum of the composite squared correlation LD coefficients, however it is desirable to find an approximate distribution for a similar statistic that would include HWE deviations explicitly into the sum and work for multiple alleles. This year, we published two chi-square approximations, one based on eigenvalues of the variance-covariance matrix of the composite LD coefficients, and the second, much simpler chi-square approximation for a statistic that adds (non-independent) squared LD coefficients across all pairs of alleles, while still incorporating HWE deviations into the variance. We performed extensive comparisons showing good statistical properties of these approximations. For the case of the known haplotype phase, the new test has high power as a test for homogeneity of multinomial samples (e.g. samples of alleles or genotypes) obtained from several populations. The power is found to be higher than that of the traditional approaches under models of association between rows and columns where association is spread out among multiple cells.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Intramural Research (Z01)
Project #
1Z01ES101866-04
Application #
7734541
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
4
Fiscal Year
2008
Total Cost
$361,127
Indirect Cost
City
State
Country
United States
Zip Code
Zaykin, Dmitri V; Shibata, Kyoko (2008) Genetic flip-flop without an accompanying change in linkage disequilibrium. Am J Hum Genet 82:794-6;author reply 796-7
Zaykin, Dmitri V; Pudovkin, Alexander; Weir, Bruce S (2008) Correlation-based inference for linkage disequilibrium with multiple alleles. Genetics 180:533-45
Zaykin, Dmitri V; Zhivotovsky, Lev A; Czika, Wendy et al. (2007) Combining p-values in large-scale genomics experiments. Pharm Stat 6:217-26
Warren, L L; Hughes, A R; Lai, E H et al. (2007) Use of pairwise marker combination and recursive partitioning in a pharmacogenetic genome-wide scan. Pharmacogenomics J 7:180-9
Bang, Heejung; Mazumdar, Madhu; Zaykin, Dmitri (2007) A letter to the editor in reply to ""susceptibility to guillain-barre syndrome is associated to polymorphisms of CD1 genes"" by Caporale et al. in the J of Neuroimmunology (2006), 177:112-118. J Neuroimmunol 186:201-2
Zaykin, Dmitri V; Meng, Zhaoling; Ehm, Margaret G (2006) Contrasting linkage-disequilibrium patterns between cases and controls as a novel association-mapping method. Am J Hum Genet 78:737-46
Zaykin, Dmitri V; Young, S Stanley (2005) Large recursive partitioning analysis of complex disease pharmacogenetic studies. II. Statistical considerations. Pharmacogenomics 6:77-89
Zaykin, Dmitri V; Zhivotovsky, Lev A (2005) Ranks of genuine associations in whole-genome scans. Genetics 171:813-23