In the field of population genetics, machine learning methods are emerging as promising frameworks for understanding evolution. However, these algorithms rely heavily on simulated datasets, which currently fail to recapitulate the features of diverse natural genomes. Deep neural networks in particular are disconnected from evolutionary modeling, and their results are difficult to interpret in a biological context. In this project, we propose to develop simulation frameworks that automatically adapt to any population or species. The resulting customized synthetic datasets will be used to train neural networks that quantify the unique evolutionary histories of understudied human groups. By including genealogical and epigenetic information as auxiliary input, we will be able to link predictions back to genomic features. Our results will enable us to estimate the interactions between local phenomena such as natural selection, mutation patterns, and recombination hotspots. Taken together, outcomes from our work will allow us to create a detailed model evolutionary of processes, both along the genome and across human populations.

Public Health Relevance

In population genetics, machine learning methods are emerging as promising frameworks for understanding evolution. However, it is difficult to apply these algorithms to understudied populations, as they are reliant on custom simulations, difficult to interpret, and disconnected from evolutionary modeling. The goals of this project are to develop simulation frameworks that automatically adapt to diverse datasets, allowing us to study evolutionary forces along the genome and across human populations.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Academic Research Enhancement Awards (AREA) (R15)
Project #
1R15HG011528-01
Application #
10114449
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Sofia, Heidi J
Project Start
2020-12-18
Project End
2023-11-30
Budget Start
2020-12-18
Budget End
2023-11-30
Support Year
1
Fiscal Year
2021
Total Cost
Indirect Cost
Name
Haverford College
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
002502615
City
Haverford
State
PA
Country
United States
Zip Code
19041