A large number of heritable traits are not amenable to pedigree analysis because of incomplete penetrance, the requirement for a environmental factor, or the infeasibility of collecting genotypic data from relatives of affected individuals. Such traits must be localized by population-genetic-association studies. However, the detection of polymorphic genes that influence quantitative traits, disease states, and other characters by association analysis depends on the persistence of measurable linkage disequilibrium (i.e., haplotype-allele association). An approach to the maximization of the linkage-disequilibrium for gene-localization studies involves mapping by admixture linkage disequilibrium (MALD), whereby populations composed of recently mixed ethnic groups display transient linkage disequilibrium over longer centimorgan (cM) intervals for at least 20 generations as compared to the panmictic founder populations from which the admixed ethnic group was derived. Theoretical and simulation studies that predict the limits of population parameters influencing MALD assessment have been described by this and other laboratories. An integrated approach for the analysis of genetic data in relation to MALD assessment has been developed using the SAS System of software products. Three types of basic data are stored as SAS tables: 1) genotypes by locus and subject identifier; 2) demographic/clinical data including ethnic affiliation and disease status by subject identifier; 3) genetic localization (chromosome and cM position) by locus. Each locus is placed on a genetic map that uses the Marshfield Research Foundation cM values as a framework. Radiation hybrid centiray (cR) values and cM values from other genome centers are converted to Marshfield cM values by principal axis regression. Allele frequencies and heterozygosity values are determined for each locus and population. By incorporating the frequency of homozygotes into the computations, the confidence intervals computed for the allele frequencies are valid even in the absence of Hardy-Weinberg equilibrium. Locus-level measures of allele frequency differences including composite delta (the sum of the gene frequency differences in one direction), log likelihood-ratio statistics, and Kaplan-Weir I* association statistics are determined between populations. To determine the significance of differences in allele frequencies, tests are performed of the null hypothesis that there is no association between allele frequencies at a locus and the population of origin of the subjects. These tests are performed for all alleles at each locus considered together and for each allele (i.e., its presence or absence) considered individually. In addition, SAS data tables are converted to properly formatted text files for further analysis by other genetic data analysis software including GDA and MLOCUS. The GDA program tests for Hardy-Weinberg disequilibrium and linkage disequilibrium, ascertains population hierarchy, and produces tables of pairwise genetic distances suitable for phylogenetic analysis while the MLOCUS program computes haplotype frequencies from phase unknown multiple locus genotypes. The SAS-based software system has been used to detect and characterize linkage disequilibrium between single nucleotide polymorphisms (SNPs) at the FY and GC loci and nearby short tandem repeat (STR) markers, to determine between-population allele frequency difference statistics for over 740 STR markers, and to perform a case-control association analysis of prostate cancer in African Americans.
Smith, Michael W; Patterson, Nick; Lautenberger, James A et al. (2004) A high-density admixture map for disease gene discovery in african americans. Am J Hum Genet 74:1001-13 |