While the 2% of our genome that encodes proteins has been successfully annotated and characterized, deciphering the regulatory code embedded in the remaining 98% remains a challenge. Improving our understanding of the regulatory code is essential for determining the cell-type-specific regulatory circuits that govern organogenesis and contribute to disease. Developmental regulatory networks in the sea urchin have been comprehensively described. In vertebrates, however, most studies of regulatory variation have taken a top-down approach, computationally predicting regulatory elements or experimentally characterizing binding sites of individual transcription factors. Although many known regulatory regions are large and complex, preliminary in vivo functional studies indicate that DNA sequences as short as six base pairs (bp) drive precise expression of a reporter gene to specific tissues, and that a two-nucleotide substitution leads to a change in the domain of the expression. The proposed project will pioneer a novel bottom-up approach to decipher the vertebrate regulatory code by characterizing the regulatory potential of all 6-base pair (bp) sequences. These short motifs will be tested for enhancer activity using zebra fish transgenesis during development. The first specific aim of the project is to computationally design a collection of reporter constructs that covers all 6-mers compactly and enables efficient functional characterization of 6-mer enhancers. This is a challenging computational problem, and the algorithms designed and implemented at this stage will be a powerful resource for the efficient design of oligomers for future enhancer experiments and many other biological assays.
The second aim i s to compute and analyze the genomic distributions of experimentally validated enhancers and use the results to interpret expression data. Finally, the third aim is to develop and test models for the interaction of multiple regulatory 6-mers. Particular attention will be devoted to additive effects of combinations of enhancers and the identification of potential silencers. By building and functionally translating a regulatory language from scratch, this project complements top-down efforts to understand the regulatory code. This project will have an enormous impact on numerous biological fields, from developmental and evolutionary biology to genome annotation. Various clinical applications will also benefit from this project, such as reprogramming strategies for stem-cell-based regenerative therapies. In addition, it will pave the way for gene therapy by genetically engineering regulatory elements that drive compounds to specific tissues at different time points.

Public Health Relevance

Development proceeds via spatially and temporally exquisite patterns of gene regulation, which influence a wide range of biological processes from organ formation to human diversity and disease. The goal of this project is to efficiently study all short DNA sequences for their potential as regulators of gene expression during vertebrate development. This work will significantly advance our knowledge about the human genome and developmental gene regulation, with important applications in synthetic biology and the development of therapeutics.

National Institute of Health (NIH)
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Postdoctoral Individual National Research Service Award (F32)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-F08-E (20))
Program Officer
Coulombe, James N
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Broad Institute, Inc.
United States
Zip Code
Smith, Robin P; Riesenfeld, Samantha J; Holloway, Alisha K et al. (2013) A compact, in vivo screen of all 6-mers reveals drivers of tissue-specific expression and guides synthetic regulatory element design. Genome Biol 14:R72