Detecting signals of selection can provide biological insights into adaptations that have shaped human history. Genetic variants and phenotypes that are highly differentiated between closely related subpopulations (historically viewed as a source of confounding due to population stratification), can enable the detection of natural selection on functionally important genes, the result of interaction between genes and environment. This approach can detect either selection on individual genetic variants, or polygenic selection reflecting the combined impact of environmental stimuli on many genetic variants that influence a trait. In either case, closely related subpopulations and very large samples are required in order for the approach to have sufficient power. The resulting insights are complementary to those obtained from GWAS, but a challenge is that it is often unclear how to select subpopulations to compare. Here, we propose to address this challenge by analyzing population differentiation along axes of variation inferred using principal components analysis (PCA). We will apply this approach to identify genetic variants with unusual differentiation along top PCs, and to detect polygenic selection on phenotypes with unusual differentiation along top PCs in their genetic values. Our research will be driven by large empirical data sets, with a total of >700,000 samples with genetic data and rich phenotype data. Our development of methods to detect the action of natural selection on environmental stimuli will serve to elucidate the connection between genes and environment in human disease.
Most common diseases have a substantial genetic component, but genetic association studies have been only partially effective in identifying the underlying genetic risk variants. We (and other researchers) have previously shown that genes affecting disease risk are often highly differentiated between closely related populations as a consequence of interaction between genes and environment, and that detection of these signals of selection can help identify disease-associated variants. In this proposal, we will search for signals of selection by applying Principal Components Analysis, a dimensionality reduction technique, to very large data sets.