In the field of population genetics, machine learning methods are emerging as promising frameworks for understanding evolution. However, these algorithms rely heavily on simulated datasets, which currently fail to recapitulate the features of diverse natural genomes. Deep neural networks in particular are disconnected from evolutionary modeling, and their results are difficult to interpret in a biological context. In this project, we propose to develop simulation frameworks that automatically adapt to any population or species. The resulting customized synthetic datasets will be used to train neural networks that quantify the unique evolutionary histories of understudied human groups. By including genealogical and epigenetic information as auxiliary input, we will be able to link predictions back to genomic features. Our results will enable us to estimate the interactions between local phenomena such as natural selection, mutation patterns, and recombination hotspots. Taken together, outcomes from our work will allow us to create a detailed model evolutionary of processes, both along the genome and across human populations.
In population genetics, machine learning methods are emerging as promising frameworks for understanding evolution. However, it is difficult to apply these algorithms to understudied populations, as they are reliant on custom simulations, difficult to interpret, and disconnected from evolutionary modeling. The goals of this project are to develop simulation frameworks that automatically adapt to diverse datasets, allowing us to study evolutionary forces along the genome and across human populations.