The regulation of gene transcription is crucial for the function and development of all organisms. While gene prediction programs that identify protein coding sequence have been used with remarkable success in the annotation of recently published genomes, the development of computational methods to analyse noncoding regions and delineate transcriptional control elements is still in its infancy. Using Drosophila melanogaster as a model, we intend to develop and validate computational strategies to locate the transcriptional control modules in the noncoding regions of the genome of higher eukaryotes where multiple inputs come together to control a gene in a particular context, and to parse them into individual transcription factor binding sites. Focusing on two transcriptional paradigms, the patterning of the early embryo by the segmentation genes and the specification of the glial cell fate by the master transcriptional regulator, Glial cells missing, we will develop algorithms that use raw genomic sequence and supplemental information such as the regulatory region of the orthologous gene in a related species, examples of transcription factor binding sites pertinent to a certain class of sequence modules, and one or more related sequence modules with unknown protein binding sites. All computational strategies will rely on probabilistic models of how the genome encodes regulatory information and will be built in part on algorithms we have developed for yeast. They will exploit the frequent occurance of multiple copies of the same binding motif in a module, and the enrichment of certain combinations of motifs in a module in comparison with the genome at large. The proposed research involves the close collaboration between a computational and an experimental group: In order to validate computational predictions in vivo, we will use reporter gene constructs to test putative regulatory modules, whole mount in situ hybridizations to determine whether a gene is expressed in a specific tissue, and DNA chips for genome-wide expression profiling; DNA chip data will also furnish raw input for further computational analysis. The results of the validation experiments will be used to refine and improve our algorithms. Finally, the analysis will be extended to other higher eukaryotic genomes, and the computational tools we develop will be made available to the scientific community at large.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Exploratory/Developmental Grants Phase II (R33)
Project #
5R33GM066434-03
Application #
6767794
Study Section
Genome Study Section (GNM)
Program Officer
Tompkins, Laurie
Project Start
2002-07-01
Project End
2006-06-30
Budget Start
2004-07-01
Budget End
2005-06-30
Support Year
3
Fiscal Year
2004
Total Cost
$434,811
Indirect Cost
Name
Rockefeller University
Department
Physics
Type
Other Domestic Higher Education
DUNS #
071037113
City
New York
State
NY
Country
United States
Zip Code
10065
Schroeder, Mark D; Greer, Christina; Gaul, Ulrike (2011) How to make stripes: deciphering the transition from non-periodic to periodic patterns in Drosophila segmentation. Development 138:3067-78