Transcription is at the heart of the regulation of gene expression, yet the computational analysis of transcription regulation currently faces a number of challenges and opportunities: The large number of sequenced genomes allows to study and exploit the conservation of regulatory sequences, but algorithms that do so in a rigorous framework are still scarce. Detailed data of spatiotemporal gene expression has become available, enabling us to use this information to elucidate regulatory interactions in the development of complex organisms. The long-term goal is to build computational models to infer regulatory networks and their evolution in the development of model organisms and ultimately humans. The objective of this particular proposal is to develop algorithms to analyze the conservation of gene regulation on the sequence level, as well as an integrated approach to model conserved regulatory regions important for development.
Its specific aims are: (1) To decipher the precise requirements to define a functional transcription start site, based on a comparative study of the conservation of core promoter elements in two fly genomes, and build a model for genome-wide comparative annotation. (2) To develop and implement an efficient progressive multiple alignment algorithm for non-coding regulatory sequences based on phylogenetic hidden Markov models, and to study the evolution of core promoters in a wider set of species. (3) To extend the framework set by this algorithm to more complex regulatory modules (such as developmental enhancers and E2F target genes), and to incorporate prior information on putative upstream factors to predict regulatory interactions. Computational predictions will be validated by a small number of experiments. The proposed research is expected to advance the understanding on the evolution of regulatory regions, and how to build computational models that accurately utilize sequence information from several species. Relevance to public health: Understanding how gene regulation is encoded in the genome is undoubtedly one of the most interesting challenges in molecular biology today, and it is intuitive that errors occurring in this machinery lead to mis-expression of genes, and may often be important in genetically based diseases. Our research will help to find the exact regulatory regions in DNA, both computationally and experimentally, and to learn the mechanisms that control the expression of genes in model organisms and humans.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-A (52))
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Duke University
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Katzenberger, Rebeccah J; Rach, Elizabeth A; Anderson, Ashley K et al. (2012) The Drosophila Translational Control Element (TCE) is required for high-level transcription of many genes that are specifically expressed in testes. PLoS One 7:e45009
Natarajan, Anirudh; Yardimci, Galip G├╝rkan; Sheffield, Nathan C et al. (2012) Predicting cell-type-specific gene expression from regions of open chromatin. Genome Res 22:1711-22
Rach, Elizabeth A; Winter, Deborah R; Benjamin, Ashlee M et al. (2011) Transcription initiation patterns indicate divergent strategies for gene regulation at the chromatin level. PLoS Genet 7:e1001274
Ni, Ting; Corcoran, David L; Rach, Elizabeth A et al. (2010) A paired-end sequencing strategy to map the complex landscape of transcription initiation. Nat Methods 7:521-7
Parry, Trevor J; Theisen, Joshua W M; Hsu, Jer-Yuan et al. (2010) The TCT motif, a key component of an RNA polymerase II transcription system for the translational machinery. Genes Dev 24:2013-8
Arunachalam, Manonmani; Jayasurya, Karthik; Tomancak, Pavel et al. (2010) An alignment-free method to identify candidate orthologous enhancers in multiple Drosophila genomes. Bioinformatics 26:2109-15
Majoros, William H; Ohler, Uwe (2010) Modeling the evolution of regulatory elements by simultaneous detection and alignment with phylogenetic pair HMMs. PLoS Comput Biol 6:e1001037
Megraw, Molly; Pereira, Fernando; Jensen, Shane T et al. (2009) A transcription factor affinity-based code for mammalian transcription initiation. Genome Res 19:644-56
Majoros, William H; Ohler, Uwe (2009) Complexity reduction in context-dependent DNA substitution models. Bioinformatics 25:175-82
Yokoyama, Ken Daigoro; Ohler, Uwe; Wray, Gregory A (2009) Measuring spatial preferences at fine-scale resolution identifies known and novel cis-regulatory element candidates and functional motif-pair relationships. Nucleic Acids Res 37:e92

Showing the most recent 10 out of 12 publications