More than 90% of cancer somatic mutations fall in non-coding regions of the genome. While the vast majority of these mutations are expected to be passengers, a small fraction could act as drivers by disrupting regulatory interactions between transcription factors (TFs) and DNA, leading to dysregulation of gene expression. Despite the increased availability of whole-genome cancer somatic mutation data, identifying non-coding driver mutations in TF binding sites remains a challenge, likely because of gaps in our understanding of the mutagenic processes acting on regulatory DNA. There is increasing evidence that TFs bound to the genome can interfere with normal DNA replication and repair, which could lead to increased mutation rates. However, the precise molecular mechanisms of these interactions are poorly understood. The current project will address this gap in our understanding of mutagenesis in TF binding sites by using new experimental techniques and computational models to directly investigate the molecular mechanisms by which TF binding can lead to genetic mutations. Our overarching hypothesis is that TFs bound to genomic sites containing DNA lesions can lead to inefficient repair/correction of the lesions, and consequently to increased mutation rates. Among DNA lesions we will focus on base mismatches, which are important precursors for a majority of cancer mutations, and are frequently generated in the genome during DNA replication, recombination, and even repair of other lesions. Thus, mismatches retained in cancer genomes due to being bound by TF proteins are expected to have a large contribution to the hyper-mutation signal observed at TF binding sites. It is therefore critical to characterize the TF-DNA binding landscape in the presence of DNA mismatches (Aim 1), and to thoroughly investigate the mechanisms by which TF binding to mismatches can lead to genetic mutations (Aim 2). Overall, our proposed work represents a first step toward determining whether the enrichment of cancer mutations in TF binding sites is due, at least in part, to DNA-bound TFs that act as roadblocks for DNA mismatch correction and repair. Longer term, a thorough understanding of the role of TFs in mutagenesis at regulatory DNA will be instrumental for developing accurate methods to identify regulatory driver mutations in cancer.

Public Health Relevance

Two thirds of genetic mutations that lead to cancer originate from unrepaired replication errors, most commonly mismatched base pairs. Recent studies hypothesized that transcription factor proteins act as roadblocks for DNA repair and replication, leading to mutations in regulatory DNA. The computational models and the experimental approaches developed in this project will lead to a better understanding of how transcription factors and mismatch repair enzymes recognize DNA mismatches, how they compete for binding to mismatches, and how/whether this competition leads to genetic mutations. In the long term, the work proposed here will lead to the development of null models for how mutations are generated in transcription factor binding sites, models that will be instrumental for identifying driver non-coding mutations in cancer.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Keane-Myers, Andrea
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Duke University
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code