Phylogeny is fundamental to our understanding of biology and has translational applications to many areas of human health including epidemiology, cancer biology and immunology. Genome sequences from closely related species such as the great apes contain a wealth of information about their evolutionary history, including the species phy- logeny and divergence times, population demography, and possible episodes of hybridization or admixture. How- ever, extracting this information requires advanced probability models and ef?cient statistical and computational methods. This is because population genetic processes are stochastic and sequences from closely related species are highly similar containing only weak historical information about some parameters. For this reason, it is critical to develop parametric statistical methods that maximize the information extracted from the data. In this project we aim to develop ef?cient Bayesian computational methods for analysis of genome-scale datasets under the multispecies- coalescent-with-introgression (MSci) model. The proposed research will develop and implement novel algorithms and statistical methods in the program bpp to infer the number, the directions, timings, and intensity of introgression events between species (Aim 1). The program will then accommodate naturally both deep coalescence and introgression in the model. This will also allow a novel Bayesian method to be developed for inferring the probability that particular loci (genomic regions) are introgressed from a particular species admixture event for each sequence of a diploid individual (Aim 2). This question is of broad relevance and has been a subject of intense interest with respect to hominid admixtures. Another useful extension will be the addition of ongoing migration between pairs of populations using an ef?cient new migration model formulation (Aim 3). The method will provide parameter estimates of migration rates that are particularly relevant for designing safe CRISPR gene drive experiments in wild populations. The range of species that the bpp program can be applied to will be expanded by incorporating a more parameter rich model of DNA substitution (GTR+G) that better accommodates multiple substitutions per site and is necessary for analyzing more distantly related species. Moreover, we will allow fossil calibrations and a relaxed molecular clock (incorporating the features of our other program for divergence time estimation MCMCtree into bpp)(Aim 4). Fossil calibrations will allow estimates of divergence times in units of years rather than expected DNA substitutions. To broaden the accessibility of the program to users without command line program experience we will further develop a cross- platform GUI for bpp (BPPg) using a modern Javascript framework (Aim 5). Finally, the statistical performance of the method will be studied and compared to other methods (when they exist) by simulations and by analysis of paradigmatic datasets (Aim 6).

Public Health Relevance

The proposed research is aimed at developing new statistical methods and computer algorithms for analyzing genome sequences of individuals sampled from within populations or between different populations and species. The new methods will identify ancient episodes of interbreeding between populations or species, estimate admixture propor- tions and the timing of admixture events, and identify admixed regions of the genomes of individuals. Admixture of ancient humans with Neanderthal has impacted human health and identifying introgressed genomic regions may help in understanding and treating multiple human diseases.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM123306-02
Application #
10087945
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Janes, Daniel E
Project Start
2020-02-01
Project End
2024-01-31
Budget Start
2021-02-01
Budget End
2022-01-31
Support Year
2
Fiscal Year
2021
Total Cost
Indirect Cost
Name
University of California Davis
Department
Anatomy/Cell Biology
Type
Schools of Medicine
DUNS #
047120084
City
Davis
State
CA
Country
United States
Zip Code
95618