The advent of methods for economically sequencing entire genomes has ushered in the field of population genomics. Although whole-genome sequencing harbors the potential to yield estimates of population-genetic parameters with unprecedented accuracy, the methods essential to the analysis of data have lagged behind greatly. The proposed work will develop a general statistical framework for the analysis of population-genomic data. The general strategy is to derive and computationally validate a set of efficient estimators for population-genetic parameters at three levels: individual genomes; multiple individuals within populations; and multiple populations. Specific subprojects include the measurement of patterns of variation and covariation among nucleotide sites, levels of population subdivision, and the development of novel methods to facilitate genome assembly and the refinement of genetic maps. Considerable emphasis will be focused on the development of efficient estimation algorithms for use by the genomics research community.
These methods will be widely used in both applied and basic research. Special attention will be devoted to identifying optimal sampling strategies, including the tradeoffs between depth of sequence coverage per individual and numbers of individuals, and between length, number, and quality of reads. As a consequence, the resultant methods should enable investigators to harvest the maximum possible information from their existing data sets, while also promoting the future design strategies to maximize informational yield per unit sampling and sequencing effort. The software to be developed will be permanently housed and freely available at the National Center for Genome Analysis Support, and workshops will be held to assist the user community in the implementation of such tools. The Plant Genome Research Project is co-funding this research.