Protein-DNA interaction constitutes a basic mechanism for genetic regulation of target gene expression. Deciphering this mechanism is challenging due to the difficulty in characterizing protein-bound DNA on a genomic scale. The recent arrival of ultra-high throughput sequencing technologies has revolutionized this field by allowing quantitative sequencing analysis of target DNAs in a rapid and cost-effective way. ChIP-Seq, which couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, provides millions of short-read sequences, representing tags of DNAs bound by specific transcription factors and other chromatin-associated proteins. The rapid accumulation of ChIP-Seq data has created a daunting analysis challenge. Here we propose a hidden Markov model (HMM)-based algorithm to detect genomic regions that are significantly enriched by ChIP-Seq. Our method will address complications such as sequencing bias and read alignment uncertainty. We also propose a multi-level hierarchical HMM that will allow integration of data from both ChIP-Seq and ChIP- chip. Next, we will build model-based de novo motif finding strategies that utilizing ChIP-Seq data. We believe efficient mining of all sequences identified by ChIP-Seq allows us to precisely characterize the protein-DNA interaction sites. Our long term biomedical research interest is in prostate cancer. We will apply ChIP-Seq and the data analysis tools developed in this project to investigate prostate cancer transcription (dys-) regulation. We believe effective data integration under a coherent probability framework will eventually lead to an in-depth understanding of mechanisms mediating transcription regulation in prostate cancer progression.

Public Health Relevance

Transcription regulation plays an important role in cancer progression. The development of statistical and computational strategies proposed here will help us gain in-depth understanding of mechanisms mediating transcriptional regulation in prostate cancer progression.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Emory University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Lin, Grace; LaPensee, Christopher R; Qin, Zhaohui S et al. (2014) Reciprocal occupancy of BCL6 and STAT5 on Growth Hormone target genes: contrasting transcriptional outcomes and promoter-specific roles of p300 and HDAC3. Mol Cell Endocrinol 395:19-31
Wu, L; Runkle, C; Jin, H-J et al. (2014) CCN3/NOV gene expression in human prostate cancer is directly suppressed by the androgen receptor. Oncogene 33:504-13
Cao, Qi; Wang, Xiaoju; Zhao, Meng et al. (2014) The central role of EED in the orchestration of polycomb group complexes. Nat Commun 5:3127
Cao, Fang; Townsend, Elizabeth C; Karatas, Hacer et al. (2014) Targeting MLL1 H3K4 methyltransferase activity in mixed-lineage leukemia. Mol Cell 53:247-61
Hu, Ming; Deng, Ke; Qin, Zhaohui et al. (2013) Bayesian inference of spatial organizations of chromosomes. PLoS Comput Biol 9:e1002893
Asangani, Irfan A; Ateeq, Bushra; Cao, Qi et al. (2013) Characterization of the EZH2-MMSET histone methyltransferase regulatory axis in cancer. Mol Cell 49:80-93
Gao, Shan; Xiong, Jie; Zhang, Chunchao et al. (2013) Impaired replication elongation in Tetrahymena mutants deficient in histone H3 Lys 27 monomethylation. Genes Dev 27:1662-79
Wu, Hao; Qin, Zhaohui S (2013) Exploring the cooccurrence patterns of multiple sets of genomic intervals. Biomed Res Int 2013:617545
Choi, Hyungwon; Fermin, Damian; Nesvizhskii, Alexey I et al. (2013) Sparsely correlated hidden Markov models with application to genome-wide location studies. Bioinformatics 29:533-41
Qin, Zhaohui S; Bilenky, Misha; Su, Gang et al. (2013) MotifOrganizer: a scalable model-based motif clustering tool for mammalian genomes. Front Biosci (Elite Ed) 5:785-97

Showing the most recent 10 out of 17 publications