This project will develop computational methods for understanding the relationships of gene sequences from closely related species. The goal is to determine the extent to which the histories of individual genes reflect the descent history of species, and to devise methods for inferring relationships of closely related species from sequences of multiple genes. A mathematical theory will be developed that describes how the sequence patterns of a gene in different species reflect or fail to reflect the species relationships. The frequency of occurrence of a particularly extreme form of gene tree/species tree discordance - in which the most probable genetic pattern in a set of species does not match the species tree shape - will be studied using computation and simulation, and an understanding of this extreme discordance will be incorporated into novel algorithms for inferring species trees.
Trees describing the relationships between species are the basic structures on which the diversity of life can be understood, and they are an essential component of many areas of biology. The results and methods produced by this project will produce shared software that can be used for improved inference of species relationships from genetic data.