cis-Regulatory modules (enhancers) are genomic DNA fragments (0.5 to 1.0 kb long) that contain multiple binding sites for sequence specific DNA binding transcription factors that collectively control the temporal and spatial expression dynamics of flanking genes. While DNA sequence alignments between Drosophila melanogaster genes and their orthologous DNAs outside the genus are of limited use in identifying enhancers, the additive evolutionary divergence among 12 Drosophila species is of great utility for identifying functional conserved sequences within enhancers. For example, all Drosophila enhancers characterized thus far contain multiple conserved sequence blocks (CSBs), made up of DNA binding sites for known and as yet unidentified transcriptional regulators. Comparative genomic analysis among vertebrates also reveals that many of their enhancers contain CSBs. Recent studies have demonstrated that co-regulating enhancers share conserved sequence elements. We are developing computer algorithms to identify repeat sequences within CSB clusters and to search for co-regulating enhancers throughout the Drosophila genome based on their shared conserved sequence elements. A genomic CSC database is also being developed that currently consists of over 70,000 CSB clusters obtained from evolutionary gene prints that span 70% of the Drosophila genome. Our search algorithms are designed to scan this database to detect related enhancers by a two step protocol: CSCs with the same repeated sequences as the input CSC are identified and then, via one-on-one alignments, the database CSCs are ranked in the order of their shared sequence elements with the input enhancer. This method has several advantages over previous enhancer search methods: 1) it makes no assumptions about the function of the conserved sequences -- over 50% of the shared sequences do not represent DNA binding sites for known transcription factors, 2) it requires no a priori knowledge of the functional elements in a given CSB cluster, and 3) it allows the user to focus on genes that are co-expressed in any given biological event, e.g. neural stem cell lineage development, to discover functionally related enhancers that regulate the expression of neuron identity genes. We believe that the CSC database and search algorithms will become part of the next generation of tools for the discovery and analysis of Drosophila cis-regulatory DNA sequences. This methodology will also serve as a model for identifying functionally related mammalian cis-regulatory DNAs.

Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
2010
Total Cost
$333,185
Indirect Cost
City
State
Country
Zip Code