The vast majority (>90%) of disease- and trait-associated variants emerging from genome-wide association studies (GWAS) lie in non-coding regions of the genome, and currently all but a handful lack molecular mechanisms that explain the observed associations with complex traits. Superimposition of the human regulatory DNA catalogue with GWAS data reveals a striking concentration of disease-associated variation precisely within regulatory DNA regions defined by DNaseI hypersensitive sites. We will fully understand gene regulation and mis-regulation only if we are able to examine genetic variants in their biological chromatin context in otherwise isogenic cells. TALE nuclease (TALEN) platforms can alter the DNA sequence of cells in a precise, targeted manner. We have already established the feasibility of single-TALEN- mediated homologous recombination for efficient knockout of individual regulatory DNA regions. In R21 phae, we will model candidate disease/trait-associated variants in regulatory DNA in vivo using engineered templates at 10 individual regulatory regions. We will test the feasibility and determine the salient operating characteristics of a scalable implementation of a TALEN genomic modification pipeline using K562 cells, and demonstrate knockout feasibility and specific allele insertion in primary hematopoietic cells and document efficiencies. We will test the efficiency of a using single or multiple TALENs flanking an editing site. In the R33 pahse, we will scale the TALEN pipeline process to efficiently characterize sites identified in GWAS studies as associated with red blood cell phenotypes.. These studies will provide a direct proof for the role of regulatory elements and demonstrate the relevance of GWAS associated SNPs as causative for the observed phenotypes.
These studies will lay foundation in demonstrating directly if the regulatory elements that harbor GWAS associated SNPs are indeed causative in nature and also shed mechanistic light on the role of these regulatory sequences