Deeper understanding of the degree of transferability of genetic association results and implicated biological mechanisms across populations is essential for equitable precision medicine implementation and can only be accomplished by studying the genetic architecture of complex traits in diverse populations. In our initial project period, we have shown that genetic correlation of gene expression depends on shared ancestry proportions in African American, Hispanic, and European populations. We identified a subset of genes that are well-predicted in one population, but poorly predicted in another and showed these differences are due to allele frequency differences between populations. Our results demonstrate that when comparing predicted expression levels to the observed, a balance of the training population with ancestry similar to the test population and total sample size leads to optimal predicted gene expression. Our studies of lipid traits in Yoruba, Filipino and Hispanic populations uncovered key genes likely regulated by variants that are monomorphic or rare in European populations, demonstrating why studies in diverse populations are crucial. We have optimized genetic prediction models of gene expression levels in diverse populations and thus have broadened the scope of PrediXcan. In this proposal, we seek to (1) optimize global and local ancestry-aware omics trait prediction models within and across diverse populations and (2) predict the intermediate omics traits and perform poly- omic PrediXcan analyses of complex traits in GWAS cohorts from diverse populations. We have gathered data of multiple omics traits from diverse populations for this project (genome-wide genotype, RNA-Seq, methylomics, metabolomics, and microbiome). We will use machine learning to optimize genotypic prediction models of gene expression levels, splicing ratios, methylation, metabolite levels, and microbial diversity. We expect a range of predictive power will be observed across omics traits dependent on the heritability of each trait and differences in allele frequencies and effect sizes among populations. We will integrate regulatory data and previous results from larger European populations when appropriate to prioritize functional variants in our prediction models. For each omics trait, we will survey its genetic architecture to inform the best prediction models. Our models will account for global and local ancestry and we will quantify the ancestry specific components of each omics trait. We will test the predicted omics traits for association with phenotypes of interest using either raw genotypes or summary statistics. We will use colocalization methods to determine if the SNPs driving each omics trait prediction model are also those most associated with the phenotype and thus most likely to be causal. We will combine predicted omics traits in poly-omic models to determine which genes and biological pathways are implicated for a particular phenotype. Our team is well positioned to perform novel PrediXcan-based analyses of omics traits in diverse populations and promises to maximize impact by making our scripts, models, and results publicly available.

Public Health Relevance

Differences in DNA sequence among individuals can lead to differences in omics traits like gene expression, splicing, methylation, metabolite levels, and microbial diversity, which in turn can lead to trait differences. We have developed a method that harnesses DNA differences to predict omics traits, which are then tested for association with a disease or other trait of interest. Our project will lead to a better understanding of the degree of transferability of genetic association results across populations, which has the potential to improve the implementation of precision medicine among diverse populations and reduce health disparities.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Academic Research Enhancement Awards (AREA) (R15)
Project #
Application #
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Volpi, Simona
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Loyola University Chicago
Schools of Arts and Sciences
United States
Zip Code
Andaleon, Angela; Mogil, Lauren S; Wheeler, Heather E (2018) Gene-based association study for lipid traits in diverse cohorts implicates BACE1 and SIDT2 regulation in triglyceride levels. PeerJ 6:e4314
Mogil, Lauren S; Andaleon, Angela; Badalamenti, Alexa et al. (2018) Genetic architecture of gene expression traits across diverse populations. PLoS Genet 14:e1007586