Nuclear organization of DNA is complex and consists of multiple layers. At the lowest resolution, large sections of chromosomes are packed in territories, at a higher level chromosomal regions are organized in topologically-associated domains that provide a framework for interactions of transcription factors (TFs) bound to promoters and distal regulatory elements (enhancers), and finally the structural interplay of regulators, RNA polymerase II, and chromatin then lead to regulated gene expression. In recent years, a number of methods called chromatin (conformation) capture (CC) have been developed and used to capture dynamic and stable DNA contacts that constitute genome architecture and potential regulatory interactions. A major limitation of these methods is that they all depend on a single crosslinker, formaldehyde, which crosslinks DNA to proteins as well as proteins to other proteins. This complicates the interpretation of the observed 'DNA-DNA' contacts and it lacks distance information. Here we propose an orthogonal strategy, named Distance-Hi-C (D-Hi-C), where we design, test, and apply a battery of photo-activated crosslinkers, designed to directly measure distances between interacting sites genome-wide. These bivalent crosslinkers consist of two reactive groups separated by a linker. DNA intercalaters that can be crosslinked by photo-activation (i.e. Psoralen) will be used as the reactive groups. Linkers will be of varying precise lengths, and either flexible or rigid in nature so that the 3-dimensional distance between crosslinked loci can be inferred. Such DNA-specific bivalent crosslinking reagents, when substituted for formaldehyde in Hi-C protocol, produces space constraints revealing enhancer- promoter interactions and potentially allowing the inference of the 3D arrangement of the nuclear genome with unprecedented precision. Moreover, D-Hi-C is expected to lower backgrounds and allow examination of short and moderate range interactions, which are obscured by high backgrounds of current methods. Addition of groups such as digoxigenin to the bivalent crosslinkers, in addition to biotin incorporated to the ligation products, will enable better purification and thus deeper examination of genomic interactions. Finally, sampling DNA distances in time following gene activation provides a means of exploring the 4D architecture and setting critical limits in evaluating mechanisms of gene activation.
The specific aims are to 1) synthesize a battery of bivalent crosslinkers and evaluate their ability to crosslink DNA in vitro; 2) test D-Hi-C crosslinkers relative to formaldehyde in the Drosophila nuclei model; 3) apply new crosslinkers to tier 1, ENCODE GM12878 and K562 cell lines to test their efficacy in human cells; and 4) test locus-specific photo-crosslinking that will allow a more focused and thorough examination of locus-specific interactions with the genome in a time course of gene activation. The development of these methodologies will be broadly useful in providing critical insights into gene regulatory mechanisms that are operative in normal animal development and homeostasis and that go awry in diseases like cancer.
The genetic information of all living organisms is encoded in DNA, which is packed in the cell nucleus. Expressing this information in a properly regulated manner is not only dependent on the DNA code itself but also how it is organized within the nucleus. The goal of this proposal is to develop a method that can reveal the organization of DNA in nucleus with unprecedented resolution, and thus inform us about the mechanics of how genes are misregulated to produce disease conditions and how effective therapies could be developed.
|Blumberg, Amit; Rice, Edward J; Kundaje, Anshul et al. (2017) Initiation of mtDNA transcription is followed by pausing, and diverges across human cell types and during evolution. Genome Res 27:362-373|
|Dekker, Job; Belmont, Andrew S; Guttman, Mitchell et al. (2017) The 4D nucleome project. Nature 549:219-226|
|Wang, Zhong; Martins, André L; Danko, Charles G (2016) RTFBSDB: an integrated framework for transcription factor binding site analysis. Bioinformatics 32:3024-6|