Genome-wide association studies (GWAS) have revealed thousands of genetic loci associated with common diseases. For each locus, a 'tag' single nucleotide polymorphism (SNP) has been identified along with dozens of additional SNPs in linkage disequilibrium. However, for many of these loci, the casual SNP(s) and the genes affected by the causal SNPs are not known. Since most SNPs fall in non-coding regions of the genome, and relatively little are known about the functions of these regions, it remains challenging to pinpoint the causal SNP based on its predicted impact on function. Despite the availability of non-coding genome annotation database and the development of computational approaches for narrowing down causal SNPs, most of the causal SNPs have not been validated, and there is an urgent need for large-scale validation of candidate SNPs. Two technological advances make it possible to address these needs and comprehensively validate thousands of candidate SNPs. First, DNA synthesis is now possible to perform in a parallel and cost-effective process, enabling the rapid generation of millions of DNA reporter constructs to test the impact of non-coding SNPs on reporter expression. Second, CRISPR-based genome editing tools are evolving at a fast pace, and are now able to directly alter non-coding sequences in the genome at medium- to high-throughput. The combination of these two methods to perturb DNA sequences and study the consequences provides a powerful approach highly suited to finding a small number of causal SNPs within a larger set of candidates. In a recent study from our group, we used these two approaches to pinpoint causal SNPs for a small number of genes in the immune system, providing a proof-of-principle for this proposal. We now extend this approach computationally and experimentally. First, we will develop a Bayesian hierarchical framework to integrate annotation datasets to help fine map and nominate causal SNPs for gene expression, and a meta-analysis approach to fine map causal GWAS SNPs based on validated eQTL SNPs. Second, using gene expression in immune cells as a proxy for disease, we will apply massively parallel reporter assays (MPRAs) and efficient genome engineering with CRISPR to test the impact of candidate GWAS SNPs on gene expression. Third, we will use these datasets to refine the computational models to better predict causal SNPs. Our datasets and models are expected to: (i) improve our ability to predict causal SNPs in any disease; (ii) lead us to deduce principles of genetic and functional variation in the immune system; (iii) reveal mechanisms underlying common human immune diseases.
Common autoimmune diseases - such as lupus, type I diabetes or arthritis - are in part caused by genetic changes in an individual's genome. While some of these changes have been found, human geneticists are often unable to pinpoint the exact DNA changes that contribute to disease. In this proposal, we invent and test methods to find the DNA changes with functional effects on the cells that contribute to immune diseases.