Protein-DNA interaction constitutes a basic mechanism for genetic regulation of target gene expression. Deciphering this mechanism is challenging due to the difficulty in characterizing protein-bound DNA on a genomic scale. The recent arrival of ultra-high throughput sequencing technologies has revolutionized this field by allowing quantitative sequencing analysis of target DNAs in a rapid and cost-effective way. ChIP-Seq, which couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, provides millions of short-read sequences, representing tags of DNAs bound by specific transcription factors and other chromatin-associated proteins. The rapid accumulation of ChIP-Seq data has created a daunting analysis challenge. Here we propose a hidden Markov model (HMM)-based algorithm to detect genomic regions that are significantly enriched by ChIP-Seq. Our method will address complications such as sequencing bias and read alignment uncertainty. We also propose a multi-level hierarchical HMM that will allow integration of data from both ChIP-Seq and ChIP- chip. Next, we will build model-based de novo motif finding strategies that utilizing ChIP-Seq data. We believe efficient mining of all sequences identified by ChIP-Seq allows us to precisely characterize the protein-DNA interaction sites. Our long term biomedical research interest is in prostate cancer. We will apply ChIP-Seq and the data analysis tools developed in this project to investigate prostate cancer transcription (dys-) regulation. We believe effective data integration under a coherent probability framework will eventually lead to an in-depth understanding of mechanisms mediating transcription regulation in prostate cancer progression.

Public Health Relevance

Transcription regulation plays an important role in cancer progression. The development of statistical and computational strategies proposed here will help us gain in-depth understanding of mechanisms mediating transcriptional regulation in prostate cancer progression.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
7R01HG005119-02
Application #
7895771
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Good, Peter J
Project Start
2009-07-22
Project End
2013-06-30
Budget Start
2010-08-01
Budget End
2011-06-30
Support Year
2
Fiscal Year
2010
Total Cost
$292,421
Indirect Cost
Name
Emory University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
066469933
City
Atlanta
State
GA
Country
United States
Zip Code
30322
Xu, Zheng; Zhang, Guosheng; Jin, Fulai et al. (2016) A hidden Markov random field-based Bayesian method for the detection of long-range chromosomal interactions in Hi-C data. Bioinformatics 32:650-6
Xu, Tianlei; Li, Ben; Zhao, Meng et al. (2015) Base-resolution methylation patterns accurately predict transcription factor bindings in vivo. Nucleic Acids Res 43:2757-66
Chen, Li; Wang, Chi; Qin, Zhaohui S et al. (2015) A novel statistical method for quantitative comparison of multiple ChIP-seq datasets. Bioinformatics 31:1889-96
Li, Li; Lyu, Xiaowen; Hou, Chunhui et al. (2015) Widespread rearrangement of 3D chromatin organization underlies polycomb-mediated stress-induced silencing. Mol Cell 58:216-31
Cao, Qi; Wang, Xiaoju; Zhao, Meng et al. (2014) The central role of EED in the orchestration of polycomb group complexes. Nat Commun 5:3127
Wu, L; Runkle, C; Jin, H-J et al. (2014) CCN3/NOV gene expression in human prostate cancer is directly suppressed by the androgen receptor. Oncogene 33:504-13
Cao, Fang; Townsend, Elizabeth C; Karatas, Hacer et al. (2014) Targeting MLL1 H3K4 methyltransferase activity in mixed-lineage leukemia. Mol Cell 53:247-61
Yang, Rendong; Chen, Li; Newman, Scott et al. (2014) Integrated analysis of whole-genome paired-end and mate-pair sequencing data for identifying genomic structural variations in multiple myeloma. Cancer Inform 13:49-53
Lin, Grace; LaPensee, Christopher R; Qin, Zhaohui S et al. (2014) Reciprocal occupancy of BCL6 and STAT5 on Growth Hormone target genes: contrasting transcriptional outcomes and promoter-specific roles of p300 and HDAC3. Mol Cell Endocrinol 395:19-31
Asangani, Irfan A; Ateeq, Bushra; Cao, Qi et al. (2013) Characterization of the EZH2-MMSET histone methyltransferase regulatory axis in cancer. Mol Cell 49:80-93

Showing the most recent 10 out of 27 publications