The human genome contains over 700 genes encoding proteins with zinc finger domains, more than half of which contain eight or more fingers organized in a tandem fashion. Many of these genes function as transcription factors, insulator binding proteins, or chromatin modifiers. Despite their importance, we still lack a comprehensive knowledge on the rules that determine these proteins? binding to DNA, and the existing prediction programs do not perform satisfactorily. Recently, we have developed two new methods for isolation and deep sequencing of zinc finger protein binding sites. The first, Affinity-seq directly determines the relative affinity of tens of thousands of binding sites genome-wide with high binding specificity. It also provides the opportunity for mutational analysis of binding site specificities using alternate sources of genomic DNA. The second, Spec-sec, determines the changes in binding energy for thousands of variants of a preferred sequence, and their sensitivity to DNA methylation. We propose to apply these methods for comprehensive analysis of DNA binding sites of over twenty mouse and human natural protein variants of the recombination regulator PRDM9, as well as over one hundred other human and mouse zinc finger proteins, which represent different groups of long zinc finger array proteins, and whose binding sites has not been determined previously.
In Aim 1, we will determine the specificities of PRDM9 protein variants binding to DNA.
Aim 1 a will determine how systematic changes in contact amino acids, numbers, and interactions between ZFs in PRDM9 protein variants affect their DNA binding by Affinity-seq.
Aim 1 b will determine the quantitative specificity and sensitivity to DNA methylation of each PRDM9 protein variant by Spec-seq.
Aim 1 c will use cell culture approaches to determine how conserved features of ZF arrays and combinations of motifs in the same array affect the biological activity of engineered PRDM9 protein variants.
In Aim 2, we will determine whether DNA- binding specificities of different laZFP groups co-evolve with their additional domains.
Aim 2 a will determine the commonality or uniqueness of the rules governing binding to DNA of laZFPs belonging to BTB-, SCAN-, SET-, and KRAB- containing groups, and those without additional domains, by Affinity-seq.
Aim 2 b will determine their quantitative specificity and sensitivity to CpG methylation (mCpG) status by Spec-seq.
In Aim 3, we will develop new and improved computational algorithms for binding site modeling and motif prediction based on laZFP sequences, including mCpG sensitivity.
Aim 3 a will develop enhanced specificity representations of ZFPs that take full advantage of the Spec-seq data and don?t impose the positional independence inherent in PWM models.
Aim 3 b will develop improved motif prediction models including methylation sensitivity.
Zinc finger proteins (ZFPs) represent the most abundant class of DNA binding proteins in mammalian genomes. Many of these genes function as transcription factors, insulator binding proteins, and chromatin modifiers, making the ZF motif the most broadly used mechanism of protein-DNA binding and the regulation of myriad processes in metazoans. The proposed studies will have significant impact in understanding the mechanisms of action of genes involved in all genetically influenced human diseases, including metabolic diseases, systemic disorders, infertility, and cancer.