The full annotation of an organism's genome requires the systematic identification of cis-regulatory sequences and the trans-acting factors that bind them. For all organisms, a significant remaining impediment to this goal is the limited number of transcription factors (TFs) with well-characterized DNA-binding specificities. We have developed a bacterial one-hybrid system that provides a rapid method to characterize the DNA-binding specificities of TFs. Using this technology, we have determined the specificity of 15% (108/~750) of all of the predicted sequence-specific transcription factors in Drosophila melanogaster. This catalog of specificities includes proteins representing 12 different types of DNA-binding domains and all 84 independent homeodomain family members. To complement this dataset we have developed computational tools that map the genomic distribution of TF binding site frequencies and use this information to identify putative cis-regulatory modules (CRMs) for any combination of TFs in our dataset. A web-based interface allows users to perform genome-wide searches for CRMs or to display binding site frequencies for TFs or combinations of TFs as tracks within the popular Gbrowse interface. ? ? We now propose to characterize the DNA-binding specificity of all remaining D. melanogaster TFs, including all monomeric and homo-oligomeric TFs as well as all functional heterodimeric combinations from the basic leucine zipper and basic helix-loop-helix families. We will also refine our computational tools to improve their ability to distinguish CRMs within the genome and we will integrate other data sources (e.g. ChIP-chip datasets) to enhance the ability to predict CRMs. This effort will culminate in the development of web-accessible database and search tools that will allow the scientific community to computationally identify putative CRMs that are regulated by any combination of factors of interest. An outgrowth of our analysis will be genome-wide annotations of CRMs for subsets of factors that function in known transcriptional regulatory networks. ? ? To date, a complete description of TF specificities has not been obtained in any organism. Combined with improved computational tools and the extensive and growing body of experimental studies on D. melanogaster transcription, a catalog of TF specificities will allow the systematic annotation of CRMs throughout its genome. Once developed, these databases and tools should be directly applicable to the annotation of CRMs in other organisms, including humans.

Public Health Relevance

Although the genome project has extensively mapped which DNA sequences in humans and other organisms encode genes, mapping the regulatory regions that turn genes on and off has proven to be much more difficult. We will use newly developed experimental and computational tools to systematically map these control elements in an entire genome. This new genome """"""""map"""""""" will help researchers understand how these elements function in normal cells and how mutations in these elements can lead to disease. ? ? ?

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Feingold, Elise A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Massachusetts Medical School Worcester
Schools of Medicine
United States
Zip Code
Duque, Thyago; Samee, Md Abul Hassan; Kazemian, Majid et al. (2014) Simulations of enhancer evolution provide mechanistic insights into gene regulation. Mol Biol Evol 31:184-200
Gupta, Ankit; Christensen, Ryan G; Bell, Heather A et al. (2014) An improved predictive recognition model for Cys(2)-His(2) zinc finger proteins. Nucleic Acids Res 42:4800-12
Enuameh, Metewo Selase; Asriyan, Yuna; Richards, Adam et al. (2013) Global analysis of Drosophila Cys?-His? zinc finger proteins reveals a multitude of novel recognition motifs and binding determinants. Genome Res 23:928-40
Kazemian, Majid; Pham, Hannah; Wolfe, Scot A et al. (2013) Widespread evidence of cooperative DNA binding by transcription factors in Drosophila development. Nucleic Acids Res 41:8237-52
Cheng, Qiong; Kazemian, Majid; Pham, Hannah et al. (2013) Computational identification of diverse mechanisms underlying transcription factor-DNA occupancy. PLoS Genet 9:e1003571
Anderson, Douglas M; George, Rajani; Noyes, Marcus B et al. (2012) Characterization of the DNA-binding properties of the Mohawk homeobox transcription factor. J Biol Chem 287:35351-9
Christensen, Ryan G; Enuameh, Metewo Selase; Noyes, Marcus B et al. (2012) Recognition models to predict DNA-binding specificities of homeodomain proteins. Bioinformatics 28:i84-9
Christensen, Ryan G; Gupta, Ankit; Zuo, Zheng et al. (2011) A modified bacterial one-hybrid system yields improved quantitative models of transcription factor specificity. Nucleic Acids Res 39:e83
Zhu, Lihua Julie; Christensen, Ryan G; Kazemian, Majid et al. (2011) FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res 39:D111-7
Kazemian, Majid; Blatti, Charles; Richards, Adam et al. (2010) Quantitative analysis of the Drosophila segmentation regulatory network using pattern generating potentials. PLoS Biol 8: