I propose new knowledge-based approaches for determining both the short sequences of DNA to which transcription factors bind and the ways in which the context of these sequences specify regulatory control. I will apply these approaches to recently produced binding data from chromatin-immunoprecipitation microarray efforts in S. cerevisiae. First, I will incorporate information from three-dimensional structures of protein-DNA complexes into motif discovery algorithms to identify over-represented sequences in the bound intergenic regions. Second, I will use machine learning techniques to apply multiple criteria based on biological knowledge of promoter organization to identify which of the binding sites predicted by these algorithms are genuine. As a first application of new motifs found using these approaches, I will perform hypothesis-directed sequence analysis to reconcile differences between in vivo binding and the organization of binding sites in promoters. Following experimental validation, the discovered binding motifs and organizational rules will provide the basis for a deeper understanding of the underlying logic of promoter organization, and will have direct impact on the study of transcriptional regulatory mechanisms in higher eukaryotes.
Gordon, D Benjamin; Nekludova, Lena; McCallum, Scott et al. (2005) TAMO: a flexible, object-oriented framework for analyzing transcriptional regulation using DNA-sequence motifs. Bioinformatics 21:3164-5 |