Genetic linkage maps can be created using pedigree methods, in which individuals whose relationships are known can be studied. These allow us to infer the locations in the genome where genetic crossovers occur. They have the limitation that there are only a limited number of individuals in a pedigree. When we try to make a genetic map of markers that are close together in the genome, we may not see any crossovers between those markers, even on a rather large pedigree. Linkage disequilibrium mapping uses individuals sampled from a large population. They are connected by a pedigree which is much deeper in time, and thus has many more individuals in it and many more opportunities for crossover to occur. The difficulty is that we do not known the pedigree, and must use the genetic data to estimate it. The random trees of ancestry of gene copies in a large population are called coalescents. The widely-used statistical method known as maximum likelihood can be used to analyze linkage disequilibrium mapping, by summing up the likelihood over all the possible coalescent trees that could explain the data. The number of these trees is vast, but it has been possible to approximate likelihoods in coalescents successfully using random sampling methods. We have developed such a sampling method, a Metropolis-Hastings sampler, for the case of recombining loci. It is proposed to adapt this to linkage disequilibrium mapping. One of the problems that has to be solved to do this is to make use of data that consists of diploid genotypes. It is proposed to do this by an additional stage of random sampling, so as to sum over all the ways that the diploid genotypes could be resolved into haplotypes. We also need to be able to correct for the ascertainment bias that is introduced when a disease allele is preferentially sampled, with less attention paid to the normal allele. It is proposed to do this by treating the disease alleles as if they were a separate population, exchanging genetic material with the normal alleles by crossing-over and mutation. We have existing methods for coalescent likelihoods for geographically structured populations, and methods from these can be used to accomplish this. For some of the genetic markers, such as Single Nucleotide Polymorphisms, there are also ascertainment problems which arise because those sites that show no polymorphism are not scored. It is proposed to use a simple correction to the likelihood to cope with this. We will make available computer programs in C++ to compute the likelihoods, and distribute them, free, over the Internet as source code, documentation, and executables.
Beerli, Peter (2004) Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations. Mol Ecol 13:827-36 |
Felsenstein, J (2001) Taking variation of evolutionary rates between sites into account in inferring phylogenies. J Mol Evol 53:447-55 |
Kuhner, M K; Felsenstein, J (2000) Sampling among haplotype resolutions in a coalescent-based genealogy sampler. Genet Epidemiol 19 Suppl 1:S15-21 |
Kuhner, M K; Beerli, P; Yamato, J et al. (2000) Usefulness of single nucleotide polymorphism data for estimating population parameters. Genetics 156:439-47 |