The DNA binding domains (DBDs) of transcription activator-like effectors (TALEs) contain modular assemblies of Repeat Variable Diresidue (RVD) domains, which were recently discovered to have a strikingly simple DNA recognition code, whereby an individual RVD recognizes a single nucleotide. This simple modularity has made custom- designed TALE proteins an attractive option for fusion to nucleases to create customized TALE nucleases (TALENs) for use in creating mutations at specific, target genomic sites, for fusion to a transcriptional activation domain for targeted activation of gene expression to a transcriptional repression domain for targeted repression of gene expression, to a hydroxylase catalytic domain for targeted CpG demethylation, or to a histone demethylase for targeted lysine-specific demethylation. However, in-depth analysis of TALE-DNA recognition properties has not been undertaken. Potential off- target specificities could result in mutations a alternate recognition sites and/or increased toxicity resulting from TALENs, or undesired alterations of gene expression or epigenetic modifications resulting from artificial TALE TFs. In this project, we will investigate potential context-dependent effects in TALE-DNA recognition. In addition, we will investigate the potential impact of the position of TALE repeats within the TALE DBD, and also the potential influence of the surrounding sequence composition and DNA shape flanking the target sequence on TALE binding. We will also assay re-engineered TALEs predicted from molecular modeling to have improved DNA binding specificities. Any such context dependencies could result in off- target specificities and/or altered efficiencies of binding, and would need to be considered in projects that aim to design custom TALE proteins for use in genome engineering or other synthetic biology applications. We will use the resulting data to generate a computational model of TALE DNA-binding specificity, including implementation of online tools for predicting the DNA binding specificities of user-input TALEs and for design of TALE proteins for user-input DNA sequence regions of interest. We will also compare in vitro TALE-DNA binding data to in vivo target specificities.

Public Health Relevance

Transcription activator-like effectors (TALEs) appear to have a simple, modular DNA recognition code, which has garnered much attention for the use of custom-designed TALE proteins to target specific genomic sites for genome editing and synthetic biology applications. In this project, we will investigate potential DNA and protein context effects on TALE-DNA binding specificity. Such effects are important to investigate since if they exist, they could result in off-target specificities and would need to be considered in the design of custom TALE proteins.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Brigham and Women's Hospital
United States
Zip Code