The goal of this proposal is to discover and interpret the code by which cis-regulatory DNA controls gene expression. Although cis-regulatory logic is reasonably well understood in bacteria and yeast, this is not the case in multicellular organisms. With a very large number of types of differentiated cells, each of which have the same genetic material but express different sets of genes, metazoans such as humans devote large regions of DNA to biologically essential regulatory functions. These regulatory regions are targets of selection and evolutionary change, and noncoding polymorphisms are connected to disease susceptibility. We have developed a new approach to understanding cis-regulatory logic based on understanding the principles that govern which configurations of bound transcription factors activate transcription and which configurations repress it. This approach is implemented in a model trained on quantitative expression data at cellular resolution from blastoderm stage embryos of Drosophila melanogaster, which we use as a naturally grown gene chip. The model is able to correctly predict expression from DNA not used in the training procedure, including highly diverged sequence from distantly related species. We will interpret the cis-regulatory code by making use of a suite of tools applied to D. melanogaster and its sibling species D. erecta and D. virilis. The consideration of regulatory circuits across species at the resolution proposed represents a profound extension of network analysis. Supporting techniques include targeted chromosomal transformation of Drosophilid embryos, a sequence-based model of transcriptional control having an established predictive capability for a range of problems, whole-locus transgenes engineered at single-nucleotide resolution with recombineering, and methods for designing and testing synthetic enhancers. The forgoing methods will allow us to test proposed principles of cis-regulatory logic as they are developed in the context of naturally occurring and artificial sequences, and in perturbed trans-environments. Our ultimate goal is to predict the expression patterns of whole genes and synthetic enhancers directly from genomic sequence and data on transcription factor expression. These objectives are summarized the following four specific aims. 1) Design, synthesize, and experimentally test completely defined artificial enhancers that express naturally occurring or arbitrarily chosen patterns on the anterior-posterior axis. 2) Construct and experimentally test a model of the embryonic expression of the complete even-skipped locus. 3) Build a quantitative map of maternal gradients and gap gene expression in Drosophila virilis and Drosophila erecta. 4) Construct testable models of the maternal-gap-eve networks in Drosophila virilis and Drosophila erecta.
Although the function of that portion of DNA sequence that codes for protein is understood, the function of the part that determines how DNA turns genes on and off remains to be elucidated. The goal of this project is to understand how DNA sequence controls gene expression using the fruit fly as an experimental system. The basic science developed in this project will have long term medical applications because cancer and many birth defects result from genes being turned on and off incorrectly.
Showing the most recent 10 out of 18 publications