With many genome-sequencing projects coming to an end, the biggest remaining challenge is to comprehend the information encoded in these sequences. Identifying interactions between transcription factors (TFs) and their DMA binding sites is an integral part of this challenge. These interactions control critical steps in cell functions, and their dysfunction can significantly contribute to the progression of various diseases. ChlP-chip experiments that couple chromatin immunoprecipitation with DMA microarray analysis have become powerful tools for the genome-wide identification and characterization of transcription factor binding sites. These experiments produce massive amounts of noisy data with small number of replicates and therefore require innovative robust statistical analysis methods. The objectives of this proposal are to develop, evaluate and disseminate statistical methods for analyzing data from ChlP-chip experiments. These objectives will be accomplished through four specific aims: (1) Development of robust probabilistic methods for detecting TF bound regions. These methods will utilize the information common across probes on tiling arrays to increase power in small sample sizes. (2) Extension of the methods in Aim-1 to deal with array designs where probe sequences overlap and observations from nearby probes exhibit long-range spatial dependencies. As a result, we will develop rigorous statistical inference procedures for general tiling array designs. (3) Development of an adaptive framework for incorporating quantitative information from ChlP-chip experiments into motif finding. This will connect the first stage of the ChlP-chip data analysis, namely identification of the bound regions, with the downstream sequence analysis thereby boosting the sensitivity and specificity of the motif finding task. (4) Implementation of the statistical methods developed as part of this research in statistical packages. The resulting packages will be available to the scientific community both in stand-alone versions and as part of the Bioconductor Project which is an open source and development software project for the analysis of the genomic data. Successful completion of the proposed research will result in substantially improved statistical methods for the analysis of ChlP-chip experiments.
Showing the most recent 10 out of 51 publications