A large fraction of trait/disease-associated loci from genome-wide association studies (GWAS) is intronic or intergenic. A major barrier to elucidating the variants responsible for a given human trait/disease is the lack of understanding of the function of noncoding genome. While there have been major developments in analytical tools that exploit GWAS and large-scale epigenome resources to elucidate cell/tissue types and epigenomic events relevant for the GWAS loci, comparative genomics methods through mouse engineering approaches are critically lacking. This is a clear hindrance for leveraging large-scale and well-powered model organism eQTL and QTL studies such as the ones from diversity outbred mice to understand mechanisms underlying human diseases. Current practice of moving between human and model organism genomes solely pertains a sequence similarity-based mapping. However, this approach leads to 60-70% of the SNPs not mapping, and a significant fraction mapping to multiple locations. This project addresses key difficulties towards this end by developing a biologically relevant and statistically rigorous method, liftSNP, that goes beyond sequence similarity and incorporates epigenome and higher order regulatory grammar into mapping of human GWAS SNPs to model organism genomes. liftSNP will be developed and evaluated on GWAS SNPs from three diverse disease systems (hematologic/developmental; obesity, metabolic syndrome, T2D; neurological/autism). The results of these large-scale applications will be made available through atSNP Search and will enable researchers to lift over their GWAS SNP harboring genomic loci to mouse genome in a functionally relevant manner.
The aims will be accomplished through a combination of methodological development, theoretical analysis, data-driven simulation, computational analysis, and experimental validation. Statistical resources generated from this project will be disseminated as open-source software. Collectively, these aims will significantly enhance our comparative genomics interpretation of GWAS results.
There has been significant progress in the interpretation of disease/trait-associated noncoding SNPs and genome engineering through genome editing. However, comparative genomics interrogation of GWAS results that leverage model organism data is critically lacking. This project seeks to develop novel statistical models that can map human GWAS SNPs to model organism in a functionally relevant way to enable further follow-up and leveraging of large-scale model organism data from well-powered eQTL and QTL studies.