Enhancers are the genomic elements that encode the instructions for when and where genes are expressed during development. The majority of mutations leading to disease are thought to reside in enhancers. However, we do not understand which changes in enhancer sequence are inert sequence variations between individuals or populations and which impact gene regulation and cellular integrity. These fundamental questions remain unsolved, because we cannot relate enhancer sequence to gene expression and phenotype. This gap in our knowledge is stalling our ability to interpret genomic data and understand development, cellular integrity, and diseases. Enhancer sequences provide a scaffold for transcription factors to bind to, by recognizing specific signatures in the DNA. The physical constraints that govern how these proteins interact with enhancer DNA could lead to a set of grammatical constraints that can be used to understand the relationship between enhancer sequence and tissue specific gene expression. I propose the development of a toolkit of methodologies and approaches to decipher the grammatical constraints on tissue specific enhancer activity. I will use highly parallel functional reporter assays, high-throughput genotype to phenotype studies along with biochemical assays, synthetic biology, and loss of function strategies. These experiments will be carried out in the chordate Ciona intestinalis, as it is a unique system in which millions of enhancer variants can be assayed for function in all cells of a developing embryo. Work in Ciona will be complemented with experiments in chick, mouse and tissue culture to directly inform vertebrate development and pinpoint mutations causing disease. Determining the ?genetic code? that relates the coding sequences of genes into protein has provided detailed insight into a major component of our genome. Once we have a similar code to decipher the instructions for when and where these genes are expressed, we will have powerful tools to understand how the genome encodes the instructions for building and maintaining life.
Elucidating the grammatical constraints that relate enhancer sequence, tissue specific gene expression, and phenotype will allow us to read the instructions for development and organismal integrity that are encoded in the genome as well as pinpoint the mutations that underlie the many diseases caused by functional mistakes in enhancer sequence. This information can be used to devise regenerative medicine approaches, understand the cause of developmental defects and genetic diseases, and develop novel therapeutics to treat such afflictions.