The number of sequenced organisms continues to grow exponentially, providing evolutionary biologists with an unprecedented resolution for mapping natural selection in the genome. Comparative genomic methods identify function by searching for genomic elements that are constrained by natural selection. Modern comparative datasets are saturated with substitutions accumulated over many millions of years of evolution, allowing functional elements to be identified at the resolution of a few base pairs. Since comparative methods rely on sequence conservation as evidence of natural selection, substitutions caused by fluctuations in the strength of selection or rare adaptive events can be mistaken for a lack of function. Population genomic data are robust to these issues and are the best way to measure constraint in principle, but suffer from low per-site densities, limiting the resolution at which function can be studied. Here, we propose a comparative population genomics approach for addressing these limitations by combining polymorphism data across multiple species. Specifically, we will create an unprecedented dataset of genome assemblies of and population polymorphism data of up to 100 individuals from each of 100 species from the model system of fruit flies (family Drosophilidae).
In Aim 1, we will map selective constraint at the resolution of less than 3 base pairs and use these maps to test how constraint evolves across a clade.
In Aim 2, we will develop new a test for adaptive evolution that jointly utilizes substitution and polymorphism data from multiple species and test whether the same genes are utilized by adaptation in drosophilids. Successful completion of the project will contribute significantly to the emerging field of comparative population genomics by providing an important publicly available genomic dataset for the scientific community, new ways to design and analyze large sequencing experiments, and test fundamental assumptions about the relationship between evolution in populations and evolution over macro-evolutionary time scales. The primary goal of this NRSA F32 fellowship is to prepare me with the scientific and professional foundation to become a leader in the new field of comparative population genomics as an independent researcher. My long-term scientific goal is to lead an independent research group that utilizes comparative genomics and population genetics tools to bridge micro and macroevolutionary processes. As a postdoctoral fellow in the Petrov Lab at Stanford, I will receive new scientific training in wet lab skills, designing sequencing experiments, and comparative genomics tools by generating and analyzing a genomic dataset that will be the first of its kind. I will receive substantial training in professional leadership and network building by leading the effort to build this large genomic resource for the scientific community.
By relating patterns of genetic variation between species to natural selection, comparative genomics provides a powerful way to understand biological function in the genome at high resolution. Because comparative approaches can be sensitive to fluctuations in natural selection, we propose multi-species polymorphism as a robust alternative approach to mapping natural selection. This research proposal will generate publicly available genome assemblies and population resequencing data for 100 species of the model group Drosophilidae (fruit flies), and develop new approaches that utilize divergence and polymorphism data from many species to map natural selection at high resolution.