Phylogenetic inference methods are fundamental to our understanding of the evolutionary relationships among all units of life, from genes and gene copies, to individuals, populations, and species. Because most genomes are composed of a mosaic of recombined segments inherited from different ancestors, genealogical relationships vary across different regions of the genome. This variation has been the primary focus of new methods developed over the last two decades in the field of phylogenetic systematics. However, the growing availability of chromosome-scale genome sequence data presents many new opportunities and challenges for existing methods, which have traditionally focused on genealogical patterns among few statistically independent gene regions, as opposed to many contiguous and non-independent regions. This project involves the development of new methods for linking phylogenetic inference at genome-wide and local genealogical scales. These new methods will bring increased power to disentangle phylogenetic relationships among rapidly radiating clades, thus contributing to our understanding of biodiversity and the ecological mechanisms that generate diversity. This work also supports training opportunities for students, as well as the development of several didactic software tools for teaching evolutionary genomics, which will form the basis for a new online and interactive textbook. Together, these tools will promote increased interactions among the phylogenetics and broader data science communities.
This project supports a number of research and educational approaches for linking phylogenetic inference at genome-wide and local genealogical scales. This includes a combination of approaches (1) to improve phylogenetic network estimation from unlinked SNPs, as opposed to error-prone gene trees; (2) to improve local genealogical inference in species-level phylogenies using a novel Bayesian machine-learning approach that incorporates prior information from recombination maps and a parameterized species tree or network; (3) to develop new genomic resources in the highly diverse plant clade Pedicularis to enable the application of genome-wide phylogenetic inference methods to investigate selection and introgression; (4) to expand population-level genomic sampling across dozens of species of Pedicularis in a biodiversity hotspot, including historical samples, to investigate spatial and temporal factors affecting genetic diversity and endemism within and across species; (5) to develop a free on-line textbook based on interactive coding exercises to teach phylogenetic methods through modern data science techniques in Python; and (6) to develop a new course where students will be involved in the process of researching and generating content for this textbook.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.