Our understanding of natural selection in humans has been limited to indirect statistical inferences and experiments in distantly related model organisms or in cell lines. We propose a new approach: to identify loci that currently affect survival to a given age (i.e., viability), by mining the huge biomedical data sets now available. Our idea is to look for variants and sets of variants that change frequency over birth cohorts and generations more than expected by chance.
Aim 1 : We will identify variants that impact survival using genetic data from large cohort studies. We plan to examine changes in allele frequencies across birth cohorts, in >1 million individuals genotyped or resequenced genome-wide, starting with >200,000 genotypes from GERA and the UK Biobank. Controlling for population structure, we will assess (i) if allele or genotype frequencies change more with age than expected by chance; (ii) if the trends differ by sex; and (iii) if there is evidence for a trade-off between effects at young and old ages. We will perform these tests for single loci throughout the genome, as well as all loci in a given annotation (e.g., putatively damaging amino acid mutations). To examine current selection pressures on quantitative traits, we will consider sets of variants previously associated with over 40 quantitative traits (e.g., diabetes risk or height) and ask how the polygenic score for each varies with age and sex.
Aim 2 : We will identify variants that influence survival to adulthood or transmission odds in trio data. We propose to test for the unequal transmission of alleles from heterozygous parents to surviving children or young adults in >35,000 trios that have been genotyped or resequenced genome-wide. This data set provides high power to detect even moderate effects of selection acting early on in life (at haploid or diploid life stages) and subtle cases of meiotic drive. We will consider males and females separately as well as jointly, testing for distortion at each SNP and each haplotype block. We will also examine sets of loci that contribute to the same quantitative phenotype.
Aim 3 : We will relate current genetic variation to long-term selection pressures. By extending a statistical model that we recently developed, we will assess whether the set of variants that influences susceptibility to a given disease or anthropomorphic trait is enriched for signatures of positive, negative or balancing selection. This approach will allow us to ask: (i) Which selective pressures, if any, have influenced any of over 40 quantitative phenotypes for which we have collated mapping results; and (ii) Whether loci that currently affect development and aging (identified in Aims 1 and 2) show signals of balancing or purifying selection. In a separate analysis, we will examine which quantitative traits have contributed to local adaptation since human populations split. This research will help to identify variation that influences development and aging, aiding in individual prognosis and the understanding of disease genetics. Moreover, it will provide the first well-powered, comprehensive look at viability selection in extant humans and its relationship to long-term selective pressures.
We consider shifts in allele frequencies over birth cohorts and generations in order to identify sets of loci that influence development and aging in humans. In addition, we relate current variation in biomedical and anthropomorphic traits to signals of natural selection. This work will inform individual disease prognosis as well as deepen our understanding of recent natural selection in humans.
|Amorim, Carlos Eduardo G; Gao, Ziyue; Baker, Zachary et al. (2017) The population genetics of human disease: The case of recessive, lethal mutations. PLoS Genet 13:e1006915|
|Mostafavi, Hakhamanesh; Berisa, Tomaz; Day, Felix R et al. (2017) Identifying genetic variants that affect viability in large cohorts. PLoS Biol 15:e2002458|