Detecting signals of selection can provide biological insights into adaptations that have shaped human history. Genetic variants and phenotypes that are highly differentiated between closely related subpopulations (historically viewed as a source of confounding due to population stratification), can enable the detection of natural selection on functionally important genes, the result of interaction between genes and environment. This approach can detect either selection on individual genetic variants, or polygenic selection reflecting the combined impact of environmental stimuli on many genetic variants that influence a trait. In either case, closely related subpopulations and very large samples are required in order for the approach to have sufficient power. The resulting insights are complementary to those obtained from GWAS, but a challenge is that it is often unclear how to select subpopulations to compare. Here, we propose to address this challenge by analyzing population differentiation along axes of variation inferred using principal components analysis (PCA). We will apply this approach to identify genetic variants with unusual differentiation along top PCs, and to detect polygenic selection on phenotypes with unusual differentiation along top PCs in their genetic values. Our research will be driven by large empirical data sets, with a total of >700,000 samples with genetic data and rich phenotype data. Our development of methods to detect the action of natural selection on environmental stimuli will serve to elucidate the connection between genes and environment in human disease.

Public Health Relevance

Most common diseases have a substantial genetic component, but genetic association studies have been only partially effective in identifying the underlying genetic risk variants. We (and other researchers) have previously shown that genes affecting disease risk are often highly differentiated between closely related populations as a consequence of interaction between genes and environment, and that detection of these signals of selection can help identify disease-associated variants. In this proposal, we will search for signals of selection by applying Principal Components Analysis, a dimensionality reduction technique, to very large data sets.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Small Research Grants (R03)
Project #
5R03ES027902-02
Application #
9609452
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Mcallister, Kimberly A
Project Start
2018-04-01
Project End
2021-03-31
Budget Start
2019-04-01
Budget End
2021-03-31
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Harvard University
Department
Public Health & Prev Medicine
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
02115