Here, we propose to develop a two-step computational strategy to improve the power and resolution of identifying non-coding variants causal for autoimmune rheumatic disease by integrating functional genomic data. The computational methods developed here address an important problem in disease biology: pinpointing the precise disease-causing mutations implicated by genome-wide association studies (GWAS) and understanding the biological mechanisms by which they act. We will develop our program using activated CD4+ T cells as a model system because of their relevance to autoimmune rheumatic disease, the availability of functional genomic data, and the ability to experimentally manipulate primary T cells and related cell lines. The three overlapping aims are: 1. Leveraging allele-specific reads to increase the power of detecting functional genomic quantitative trait loci (fgQTLs). We will (i) develop an approach to accurately quantify allele-specific reads from functional genomic sequencing data while accounting for sequencing and mapping biases, (ii) develop a linear mixed model (LMM) method to perform phase-aware association tests for functional genomic traits, and (iii) apply the method to identify expression and chromatin accessibility QTLs in activated CD4+ T cells in ~100 individuals. 2. Nominate causal non-coding variants in autoimmune rheumatic disease-associated loci. We will (i) develop a method that leverages functional genomic QTLs to fine map disease-causing variants in a locus, (ii) apply the method to integrate expression and chromatin accessibility QTLs from Aim 1 with three autoimmune rheumatic disease GWAS datasets to identify disease-causing variants most likely associated with CD4+ T cell activation, (iii) computationally refine and annotate causal variants using orthogonal functional genomic data in CD4+ T cells. 3. Validate predictions using synthetic biology and genome engineering. We will (i) use massively parallel reporter assays (MPRAs) to test in activated Jurkats, ~500 synthetic constructs harboring predicted causal variants from Aims 1 and 2 prioritized for GWAS loci, and use CRISPR/Cas9 to (ii) knock out 25 enhancers harboring causal variants (a subset of the MPRA hits) in Jurkats and CD4+ primary T cells and (iii) knock-in 10 predicted causal variants in CD4+ primary T cells. We will observe the endogenous effects of genome edits by profiling molecular and cellular phenotypes during CD4+ T cell activation and differentiation.
Although genetic differences between individuals have been associated with a number of autoimmune rheumatic diseases of significant public health impact, the precise disease-causing DNA variants have proven to be difficult to find. We propose to address this problem by developing a new set of computational tools that integrates orthogonal datasets to pinpoint disease-causing non-coding variants, and an experimental validation scheme to test the predictions. Using our framework, we aim to pinpoint genetic determinants of autoimmune rheumatic disease, and identify their molecular mechanism of action, which can potentially lead to new methods of intervention.
|Kang, Hyun Min; Subramaniam, Meena; Targ, Sasha et al. (2018) Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat Biotechnol 36:89-94|
|Ye, Chun Jimmie; Chen, Jenny; Villani, Alexandra-Chloé et al. (2018) Genetic analysis of isoform usage in the human anti-viral response reveals influenza-specific regulation of ERAP2 transcripts under balancing selection. Genome Res 28:1812-1825|
|Gate, Rachel E; Cheng, Christine S; Aiden, Aviva P et al. (2018) Genetic determinants of co-accessible chromatin regions in activated T cells across humans. Nat Genet 50:1140-1150|