The Cys2His2 zinc finger DNA-binding domain is the most common domain in human yet the DNA-binding specificities for the great majority of these proteins remain undefined. Mutations in many of these domains, both with and without known DNA-binding data, have been linked to a host of diseases from Alzheimers (REST) to Cancer (e.g. Slug, WT1, CTCF). Therefore, the characterization of these proteins holds great value. Unfortunately common methodologies used to determine the DNA-binding specificity of transcription factors have failed to address the zinc finger, at least in part because of an inability to fully define the large target specificities required of the average mammalian zinc finger protein. Even when ChIP-Seq data exists it is limited because the size of the genome does not allow us to capture the full binding potential of a factor that could offer a ?21bp target sequence. As a result, without a comprehensive understanding of a protein?s binding potential, SNPs across the genome will continue to represent potential binding sites that we are unable to predict. In sum, decades of research have enlightened our understanding of this domain but we are still in the dark when it comes to its function as a transcription factors. Recently we have taken an alternative approach to define this domain, demonstrating that a synthetic, one-by-one screen of individual zinc fingers allows us to predict the specificity of multi-fingered proteins with similar or greater accuracy than all prior prediction algorithms. However, this approach fails to take into consideration the influences that adjacent fingers have on one another. We have produced the equivalent of a comprehensive snapshot of what a zinc finger is capable of in just one of many potential contextual environments. Here we propose to scale this approach and screen the zinc finger under an inclusive set of contextual environments. We will consider the most common direct and indirect influences on adjacent finger binding as well as factors that impact the geometry with which the zinc fingers engage the DNA. We will use these results to provide a complete picture of how adjacent zinc fingers determine their specificity and by scaffolding these two-fingered models, predict and design the specificity of large, multi-fingered proteins. In this way, we will define a multi-dimensional code of zinc finger specificity that allows us to predict all zinc finger DNA-binding specificities, how any neighbor finger context would modify this specificity, and the factors that result in adjacent finger incompatibility and loss of DNA-binding function. We will apply this model to predict the specificity of all human zinc finger proteins, validate these predictions through in vivo characterization of an informed set of transcription factors, and test predicted mechanisms of multi-fingered binding with designer, artificial factors.
The proposed research is relevant to public health because the ZF domain is the most common in human yet it remains largely uncharacterized. A holistic understanding of ZF function will provide insight into how ZF mutations are related to disease and allow us to predict harmful binding sites due to SNPs across the genome.