Human diseases often arise from excessive or deficient transcription of particular genes in an organism. Proteins known as transcription factors (TF) regulate transcription of a gene through binding specific sites on DNA known as TF binding sites (TFBS). One of the strategies used to study how TFs recognize TFBSs is through the development of computer algorithms for predicting TFBSs in genomic sequence. Most algorithms apply position-specific weight matrices (PWM) obtained from DNA sequences of TFBSs. However, for more than 3,000 TFs that have been identified and predicted in the human genome, PWMs were built for only 300 TFs. To apply the prediction methods to TFs with unknown PWMs, two approaches are applied. The first one proposes generating PWMs for families of TFs sharing similar DNA-binding domains. However, TFs from the same family, as designated in the existing classifications of TFs, often do not recognize the same DNA sequences. Hence, there is an appeal to a new classification of TFs able to drive the prediction of TFBSs. The second approach is to build PWMs on three- dimensional (3D) structures of TF-DNA complexes. As our preliminary results indicate, this approach can also be applied to families of TFs a PWM can be obtained for a TF family through alignment of TFBS sequences and 3D structures of TF-DNA complexes. The performance of such generated PWMs for TF families is suggested to be used for validation of the classification of TFs. The goal of the proposal is to study how similarity in sequences and structures of DNA-binding domains of TFs relates to the similarity of TFBSs. The research will focus on TFs for which at least one 3D structure of TF-DNA complex or its close homolog is available. The goal will be attained through the following specific aims: (1) Develop an automatic classification of DNA-binding domains of TFs based on similarity of sequences and structures of TFs and TFBSs;(2) Develop a structure-based approach to the prediction of TFBSs for families of TFs;and (3) Disseminate the results by means of a web resource providing access to the classification and prediction methods in the form of queries and web tools. The results of the study will be valuable for annotating TFs and regulatory regions in genomes of human and other organisms. They will also facilitate deciphering of gene regulatory networks and designing drugs for treatment of diseases associated with inadequate gene regulation. The results of the proposed study will facilitate annotating transcription factors and regulatory regions in genomes of human and other organisms. This information is significant for designing drugs for treatment of diseases associated with inadequate gene regulation.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-GGG-A (52))
Program Officer
Tompkins, Laurie
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Diego
Schools of Arts and Sciences
La Jolla
United States
Zip Code
Shih, Vincent F-S; Davis-Turak, Jeremy; Macal, Monica et al. (2012) Control of RelB during dendritic cell activation integrates canonical and noncanonical NF-?B pathways. Nat Immunol 13:1162-70
Baitaluk, Michael; Kozhenkov, Sergey; Ponomarenko, Julia (2012) An integrative approach to inferring gene regulatory module networks. PLoS One 7:e52836
Baitaluk, Michael; Kozhenkov, Sergey; Dubinina, Yulia et al. (2012) IntegromeDB: an integrated system and biological search engine. BMC Genomics 13:35
Cheng, Christine S; Feldman, Kristyn E; Lee, James et al. (2011) The specificity of innate immune responses is enforced by repression of interferon response elements by NF-?B p50. Sci Signal 4:ra11
Kozhenkov, Sergey; Sedova, Mayya; Dubinina, Yulia et al. (2011) BiologicalNetworks--tools enabling the integration of multi-scale data for the host-pathogen studies. BMC Syst Biol 5:7
Baitaluk, Michael; Ponomarenko, Julia (2010) Semantic integration of data on transcriptional regulation. Bioinformatics 26:1651-61
Kozhenkov, Sergey; Dubinina, Yulia; Sedova, Mayya et al. (2010) BiologicalNetworks 2.0--an integrative view of genome biology data. BMC Bioinformatics 11:610