The discordant signal obtained from analysis of different genes has raised new challenges for biologists. When contradictory evolutionary relationships are detected across different parts of the genome, it is difficult to recover and describe the history of a group of species. But this discordance among genes has also raised new opportunities: hybridization, gene flow, and other biological processes each leave their own signature in the pattern of gene tree variability. The goal of this project is to estimate the discordance between gene trees and the biological processes that have affected species and populations. An interdisciplinary team of statisticians, biologists and computer scientists will develop new and robust statistical methods for analysis of genes and species relationships.
The tools developed through this project will be implemented in a user-friendly program so that all phylogeneticists can utilize them. Software will be freely distributed to facilitate the utilization of these tools for the study of other groups of species. Undergraduate and graduate students will be trained at the interface between biology, statistics and computer science. Insights from this research will enhance educational modules in evolutionary theory at the undergraduate level and at K-12 teacher workshops.
Intellectual Merit We developed advanced statistical methods to learn about the past history of species from the DNA of organisms alive today. Genes are the building blocks that make up organisms, and we can compare the same gene shared across various organisms. The patterns of similarity and dissimilarity across organisms tells us about the gene's history: which organisms share a very recent common ancestor, and which organisms are more distantly related. Various biological processes can cause the history of different genes to differ, such as recombination in sexual species, hybridization between closely related species, or horizontal gene transfer between distantly related species. We developed rigorous statistical methods to study the differences between gene histories and to reconcile them to the past history of species. Our methods account for uncertainty in the inference of the genealogy of individual genes, due to limited DNA sequence data from each individual gene. Our methods provide tools that use the information across multiple genes to detect past events, such as gene flow between ancient, distantly related species. We also developed methods adapted to genes that underwent gene duplications and losses, and methods applicable to polyploid species (species containing extra copies of each chromosome). In wild potatoes, for instance, we were able to determine the origin of multiple polyploid species, despite a complicated history in which genes varied greatly in their inferred history. New polyploid origins were discovered, which matched morphological data and provided interesting hypotheses about the geographic dispersal of wild potatoes. Broader Impacts The methods we developed are applicable to many groups of organisms, and have already been applied by many other researchers to their own problems. We implemented these methods in software that is made freely available to the research community. Our software is open-source, to empower others to further improve on these methods. The project contributed to the training of 11 students and postdoctoral scholars, who gained expertise in all three of these areas: biology, statistics, and computing. This project helped develop a workforce eager and able to cross discipline boundaries.