A major function of the non-protein-coding genome is to direct specific patterns of gene expression by encoding binding sites for transcription factors. However, large genomes contain millions of spurious copies of the short, degenerate sequence motifs that transcription factors recognize. It is not understood how genuinely functional binding sites are distinguished from non-functional motifs. This is a critical unsolved problem, because growing evidence indicates that genetic variants that disrupt or create transcription factor binding sites play a widespread role in disease, but we currently cannot accurately identify variants affect active binding sites. To address this issue, our long term goal is to understand how active transcription factor binding sites are specified by their DNA sequence features. Recent results suggest that functional binding sites are distinguished from spurious motifs by critical local sequences that flank the core motif. Using the mammalian retina as a physiologically relevant model system to address this broad issue, we will investigate binding sites for the photoreceptor transcription factor CRX.
Our specific aims are: First, to understand how flanking DNA sequence features distinguish transcriptionally active CRX binding sites from inactive genomic sequences with spurious CRX motifs. Second, to quantify and model the effects of local flanking sequence features using a tractable system of synthetic CRX binding sites. The major innovation of this proposal is to measure both transcription factor binding and cis-regulatory activity on a large set of wild-type, mutant, and synthetic sequences, and thereby directly quantify the relationship between flanking DNA sequence, binding, and activity. Using recently developed high-throughput assays to measure cis-regulatory activity and CRX binding affinity, we will directly test the functional role of local flanking sequences by disrupting flanking sequence features of CRX binding sites. By combining functional genomics and synthetic biology to investigate natural and synthetic CRX binding sites, we will discover how different DNA sequence features combine to specify active CRX sites in the genome. The result will improve our understanding of how sequence variants outside core motifs affect transcription factor binding sites.

Public Health Relevance

DNA sequences that regulate genes play a critical role in normal development, and growing evidence indicates that mutations in regulatory DNA play a major role in disease. However, the ways in which mutations affect regulatory DNA is poorly understood. The goal of this project is to understand how regulatory DNA sequences are encoded in the genome, and thereby establish a basis for understanding how the function of these sequences is altered by disease mutations.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM121755-04
Application #
9951058
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
2017-07-01
Project End
2022-06-30
Budget Start
2020-07-01
Budget End
2021-06-30
Support Year
4
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Washington University
Department
Genetics
Type
Schools of Medicine
DUNS #
068552207
City
Saint Louis
State
MO
Country
United States
Zip Code
63130