Protein-DNA interaction constitutes a basic mechanism for genetic regulation of target gene expression. Deciphering this mechanism is challenging due to the difficulty in characterizing protein-bound DNA on a genomic scale. The recent arrival of ultra-high throughput sequencing technologies has revolutionized this field by allowing quantitative sequencing analysis of target DNAs in a rapid and cost-effective way. ChIP-Seq, which couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, provides millions of short-read sequences, representing tags of DNAs bound by specific transcription factors and other chromatin-associated proteins. The rapid accumulation of ChIP-Seq data has created a daunting analysis challenge. Here we propose a hidden Markov model (HMM)-based algorithm to detect genomic regions that are significantly enriched by ChIP-Seq. Our method will address complications such as sequencing bias and read alignment uncertainty. We also propose a multi-level hierarchical HMM that will allow integration of data from both ChIP-Seq and ChIP- chip. Next, we will build model-based de novo motif finding strategies that utilizing ChIP-Seq data. We believe efficient mining of all sequences identified by ChIP-Seq allows us to precisely characterize the protein-DNA interaction sites. Our long term biomedical research interest is in prostate cancer. We will apply ChIP-Seq and the data analysis tools developed in this project to investigate prostate cancer transcription (dys-) regulation. We believe effective data integration under a coherent probability framework will eventually lead to an in-depth understanding of mechanisms mediating transcription regulation in prostate cancer progression.
Transcription regulation plays an important role in cancer progression. The development of statistical and computational strategies proposed here will help us gain in-depth understanding of mechanisms mediating transcriptional regulation in prostate cancer progression.
Showing the most recent 10 out of 27 publications