In anticipation of the imminent release of population samples of genomes, this project addresses the development of an integrated theoretical, statistical, and computational framework for the analysis of genome-wide variation. This project develops evolutionary models and statistical procedures for the analysis of genome- wide associations among single nucleotide polymorphism (SNP) loci. The methodology to be developed will provide a framework both for the description of genomic SNP variation and for inference of the demographic and evolutionary processes that generated those patterns. A major component entails the development of the sampling distribution of summary statistics of genome-scale variation under various forms of population structure, including regular systems of inbreeding and population subdivision. We will explicitly address the effect of linkage on associations among nucleotide positions. We will develop a novel method for detecting genomic tracts that appear to have had unusual evolutionary histories. A novel feature of this method is its use of ensembles of multilocus association measures. We will apply our methodologies to the population samples of genomes, which are just now beginning to appear. Of particular significance are the Drosophila Genetics Reference Panel, the 1000 Genomes Project, and yeast genomes presently being sequenced by our collaborators.
This project undertakes the development of statistical procedures for the analysis of genome-wide patterns of variation observed in samples of genomes. This framework lays a basis for accounting for population structure in association mapping of factors contributing to disease and other important phenotypes. A comprehensive partitioning of genome-wide variation will be developed and used as a basis for the inference of historical demographical events and ongoing evolutionary processes.