It is a major challenge to extract useful biological knowledge from the large amounts of data that are currently being generated by genome sequencing projects and related technologies such as DNA microarrays. The surprisingly low number of protein-encoding genes found in the human genome unscores the importance of gene expression regulation as a determinant of organismal complexity. Failure of regulatory mechanisms plays a role in many human diseases. The general aim of our research is to further develop regression approaches as a new paradigm for the analysis of functional genomics data. Using simple models based on the molecular mechanisms that control transcription initiation, mRNA turnover, and chromatin remodeling we will analyze genomic sequences as well microarray data for mRNA expression and transcription factor binding. The research proposed here will build upon the success of REDUCE, our motif-based regression analysis tool for discovering cis-regulatory elements in non-coding DNA and inferring the activity of the regulatory factors binding to these elements. The fact that a single genome-wide mRNA expression pattern can be analyzed in isolation makes it possible to model the environmental condition dependence of regulatory processes.
Specific aims are to: (1) Increase the statistical power of REDUCE to detect degenerate motifs by incorporating algorithms based on suffix trees, position-specific scoring matrices, and gene-specific error estimation. We will also explore the use of comparative genomics to restrict the search for motifs to conserved regions; (2) Associate transcription factors with their functional target genes in S. cerevisiae through intergrated analysis of genomewide transcription factor binding data and a large library of mRNA expression data; (3) Uncover synergistic and competitive interactions between transcription factors through multivariate regression analysis of mRNA expression data in which such interaction are modeled explicitly. We will also analyze the possible context dependence of these interactions; (4) Characterize cis-regulatory modules in Drosophila and their target genes by combining hidden Markov modeling of clustered transcription factor binding sites in non-coding DNA with regression analysis of mRNA expression data. ? ?
Showing the most recent 10 out of 55 publications