CTCF, a highly conserved DNA binding protein, serves as a global organizer of chromatin architecture. CTCF is involved in regulation of transcriptional activation and repression, gene imprinting, control of cell proliferation and apoptosis, chromatin compartmentali-zation, X-chromosome inactivation, prevention of tri-nucleotide-repeat's expansions, and other chromatin-resident processes. It took us over 20 years of CTCF studies to persuade others that the multi-functionality of CTCF is indeed based on the ability of a highly-conserved 'multivalent 11 ZF DBD to bind a wide range of diverse DNA sequences, as well as on its intrinsic capacity to interact with a partner-proteins through the combinatorial usage of DNA-contating and protein-contacting ZFs. Last year, a similar multivalency was shown for another poly-ZF DBD array in the Drosophila Su(Hw) factor. With the advent of next generation sequencing techniques, CTCF binding sites have been identified across fly, mouse, and human genomes. Reflecting the multitude of CTCF functions, many thousands of non-homologous CTS sequences were found to be associated with genomic regions engaged in long-range chromatin interactions, including enhancers, promoters, and inter-genic boundary elements. It remained obscure, however, as to how a particular DNA sequence of any given CTS is related to specific CTCF functions at the same site. This year, we have made additional advances in the direction of understanding multiple functionality of distinct CTCF/DNA-complexes formed via different combinations of DNA-contacting fingers. By mapping simultaneous CTCF & BORIS occupancy genome-wide, we uncovered two classes of CTCF binding regions that are pre-programmed and evolutionary conserved in DNA sequence. We found that 70% of CTCF bound regions enclose a single CTCF binding site, aka 1xCTSes while other 30% of CTCF-binding regions detected by ChIP-seq as single peaks are, in fact, shown to contain the dual CTCF binding sites, aka binary 2xCTSes. Occupancy of adjacent CTSes within binary 2xCTS-regions constrains 2 adjacent CTCF proteins to form homodimers in normal somatic cells, or to assemble heterodimers of CTCF+ BORIS co-bound at the same DNA spot in germ and cancer cells co-expressing BORIS on top of CTCF. The recent breakthrough discovery of 2xCTS-regions (unresolvable by a standard CTCF-specific ChIP-Seq) enabled us, for the first time, to address the long-standing question as to how CTCF can serve in the context of the same nucleus as a bona fide transcription factor, while maintaining a substantial presence at putative insulator/boundary sites that bear no indications of transcriptional activity. Indeed, only 20% of all CTCF binding regions are located in promoter regions in any given cell type, while the remaining CTSes are not associated with transcriptional start sites. The obvious candidates for the determinants of such distinct functional roles would be DNA sequences themselves and/or differential identity of chromatin at these two types of sites. In our study we presented genome-wide evidence that DNA sequences underlying the two types of CTCF target sites are structurally different. The structural difference between two classes of CTCF binding sites is connected to their functional differences: 2xCTSes are preferentially located at H4K27ac-marked promoters and enhancers co-bound by Pol II, and the same 2xCTS elements are found to be associated with normal CTCF-BORIS-heterodimers in post-meiotic spermatids wherein BORIS marks the future protamine-free DNA zones that retain modified histones along haploid epi-genome in mature human and mouse spermatozoa. In a stark contrast, intergenic and intronic genomic regions harboring one or more 1xCTS-based CTCF peaks with the name-giving 5'-CCC(C/t)CT(a/g)-3' motif which is often hit by a disease-associated SNP affecting three-dimensional organization imprinted upon essential self-interactions among sticky C-termini and DNA-free ZF-subsets from distal CTCF/DNA complexes engaged into site-specific di-/multi-merization stabilized by cohesin retention. A remarkable link with CTCF +/- haplo-insufficiency found in genetically burdened human subjects might open up a novel avenenue in a clinically-oriented CTCF studies associated with aberrant histone/DNA-methylation encompassing CTCF-bound ChIP-Seq peaks with 2xCTS elements in H3K27ac-marked Pol2-bound promoter-enhancer pairs capable of altering gene expression in the same way that we had previously found to act in context of Ctcf+/- mice analyzed in collaboration with Chris Kemp and Galina Filippova from the Fred Hutchinson Cancer Center in Seattle. Hence, it is possible that there is a common underlying patho-mechanism for the disorders caused by CTCF deletions distinct from complete loss of classic TSG CTCF functions reported for the first time by the same team. Therefore, similar pathology-associated mechanisms seem to underlie both human and mouse genetic disorders caused by insufficient CTCF dosage exclusive of additional ZnF mutations which, even in tumors with 16q22/CTCF LOH, would cause a complete CTCF loss leading to death rather than a partial loss of DNA-CTCF interactions caused by in vivo selection of viable single a.a. substitutions within the multivalent 11 ZnF CTCF DBD that were characterized first in CTCF (1996) and found later on (2002) to be recapitulated in the CTCF-derived paralog named BORIS (an acronym for Brother Of the Regular of Imprinted States). Next, the original discovery and further studies of the binary 2xCTS code begun to challenge a widespread misconception in the current literature claiming that all CTCF sites are equivalent to each other, with a single CTCF molecule bound at a single CTS sequence in spite of the fact that CTS elements with different genomic coordinates may contain either one or two adjacent DNase I footprints over single or dual CTCF motifs without any homologies necessary for reliable motif-based predictions. The functional and chromatin structural features of enhancer/promoter-associated 2xCTS-elements are distinct from those of 1xCTS-containing regions bound by CTCF monomers mostly within intronic and long inter-genic zones along chromosomal mouse and human DNA. We suggest that these 2 previously overlooked classes of CTCF binding regions may have different roles in regulating diverse chromatin-based phenomena, and may impact our understanding of heritable epigenetic regulation in cancer cells and normal germ cells. For instance, non-random retention of sperm nucleosomes placed selectively into protamine-free loci was found to be pre-determined by the nt context of the same 2xCTS-containing CTCF elements that are normally co-bound by both CTCF & BORIS 11 ZF paralogs, which are normally co-expressed in post-meiotic round spermatids. Moreover, CTCF and the cohesin complex are widely recognized as key players in the establishment and maintenance of 3D genome architecture in all mammalian cells. These proteins are not just well known in the scientific community but have recently entered the popular press including March issue of Scientific American at www.scientificamerican.com/article/untangling-the-formation-of-dna-loops. Taken together, our results provide a global view of chromatin dynamics and a resource for studying long-range control of gene expression in distinct human cell lineages, as well as explain why from a multitude of Transcription Factors, only CTCF has been recognized as a universal and possibly irreversable epigenetic mark present in all cell types at functionally distinct regions in order to orchestrate non-random positioning of modified histones and methylated DNA at hundreds of thousands of CTCF-associated DNA sequences catalogued by the ENCODE Consortium at www.factorbook.org/human/chipseq/tf/CTCF.
Showing the most recent 10 out of 21 publications