This proposal will continue the development of methods for making likelihood inferences from population samples of molecular sequences, where there may be recombination within the sequences and natural selection affecting some sites. Molecular and population biologists have continued to collect increasing amounts of such data; as in other areas, likelihood methods will become important in their analysis. Population biologists want to use the sequences to understand population history and evolutionary forces, while molecular biologists will want to use within-species variation to understand which regions of a molecule are constrained from varying. It is proposed to develop statistical methods for approximately computing the likelihoods of population parameters, such as effective population sizes and population growth rates; genetic parameters such as the rate of recombination; and patterns of natural selection such as balancing selection or directional selection acting at particular sites. The likelihoods can be computed if we can sum them over all the possible recombining genealogies connecting the members of an observed sample. While there are far too many genealogies to do the sum exactly, Markov Chain Monte Carlo methods such as the Metropolis-Hastings method can be used to draw a large enough random sample of genealogies, and use these to estimate the likelihood curves. Previous work has developed algorithms for recombining sequences; the present proposal will new methods for calculating the likelihoods in the presence of natural selection at specific sites. It will also integrate the existing methods into a usable whole which will allow biologists to construct an analysis of the particular combination of evolutionary forces that they want to consider. The methods are computer intensive; they will be made available, free, over the Internet, as the LAMARC package of programs distributed in C source code and as executables.
Showing the most recent 10 out of 16 publications