A better understanding of the degree of transferability of genetic association results and implicated genes across populations has implications for precision medicine and can only be accomplished by studying the genetic architecture of complex traits in diverse populations. For many complex traits, gene regulation is likely to play a crucial mechanistic role given the consistent enrichment of regulatory variants among trait-associated variants. We have developed a gene-level association method called PrediXcan that harnesses the regulatory knowledge generated by expression quantitative trait loci (eQTL) studies to directly test for genes associated with complex traits. An advantage of this gene-based approach over other aggregate variant approaches is that the results are inherently mechanistic and provide directionality, guiding follow-up experiments and future drug development. The genetic contribution to population phenotypic differentiation is driven by differences in causal allele frequencies, effect sizes, and genetic architectures. We propose to broaden the scope of PrediXcan to include diverse populations by (1) optimizing predictors of gene expression within and across diverse populations in multiple tissues and (2) performing gene-level association studies and quantifying regulability on a range of phenotypes in non-European populations. We will use machine learning to optimize predictive models of gene expression in datasets with both genome-wide genotype and gene expression data. We will integrate prior results from larger European populations when appropriate. Based on preliminary results, we expect a range of predictive power (assessed by cross-validation R2) will be observed across genes dependent on the heritability of each gene expression trait and differences in allele frequencies and effect sizes among populations. We will compare populations by 1) calculating the correlation between heritability estimates and cross-validated prediction performance and by 2) by calculating trans-population genetic effect size correlations (allele frequency independent) and trans-population genetic impact correlations (allele frequency dependent). The optimal models will also inform the underlying genetic architectures (sparse vs. polygenic) of gene expression traits and how they vary across populations. As we have done for European populations, the predictive models and heritability estimates developed here will be added to an open access database for use in PrediXcan and other studies. We hypothesize that PrediXcan will increase power to identify genes and implicate mechanisms underlying complex traits and that we can quantify the overall effect of phenotypic variation explained by transcriptome regulation within and across populations. We will compare gene-level results across populations to determine if the same and/or unique genes and pathways are implicated for a particular phenotype. We will estimate the proportion of phenotypic variance explained collectively by all gene expression levels, which we name the regulability of a trait. All results, scripts, and software will be available in publicly accessible databases and repositories.

Public Health Relevance

Differences in DNA sequence among individuals can lead to differences in gene expression levels, which in turn can lead to trait differences. We have developed a method that harnesses these DNA differences to predict gene expression levels, which are then tested for correlation with a disease or other trait of interest. Our project will lead to a better understanding of the degree of transferability of genetic association results and implicated genes across populations, which has the potential to improve the implementation of precision medicine among diverse populations and reduce health disparities.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Academic Research Enhancement Awards (AREA) (R15)
Project #
Application #
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Struewing, Jeffery P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Loyola University Chicago
Schools of Arts and Sciences
United States
Zip Code
Andaleon, Angela; Mogil, Lauren S; Wheeler, Heather E (2018) Gene-based association study for lipid traits in diverse cohorts implicates BACE1 and SIDT2 regulation in triglyceride levels. PeerJ 6:e4314
Mogil, Lauren S; Andaleon, Angela; Badalamenti, Alexa et al. (2018) Genetic architecture of gene expression traits across diverse populations. PLoS Genet 14:e1007586