With many genome-sequencing projects coming to an end, the biggest remaining challenge is to comprehend the information encoded in these sequences. Identifying interactions between transcription factors (TFs) and their DMA binding sites is an integral part of this challenge. These interactions control critical steps in cell functions, and their dysfunction can significantly contribute to the progression of various diseases. ChlP-chip experiments that couple chromatin immunoprecipitation with DMA microarray analysis have become powerful tools for the genome-wide identification and characterization of transcription factor binding sites. These experiments produce massive amounts of noisy data with small number of replicates and therefore require innovative robust statistical analysis methods. The objectives of this proposal are to develop, evaluate and disseminate statistical methods for analyzing data from ChlP-chip experiments. These objectives will be accomplished through four specific aims: (1) Development of robust probabilistic methods for detecting TF bound regions. These methods will utilize the information common across probes on tiling arrays to increase power in small sample sizes. (2) Extension of the methods in Aim-1 to deal with array designs where probe sequences overlap and observations from nearby probes exhibit long-range spatial dependencies. As a result, we will develop rigorous statistical inference procedures for general tiling array designs. (3) Development of an adaptive framework for incorporating quantitative information from ChlP-chip experiments into motif finding. This will connect the first stage of the ChlP-chip data analysis, namely identification of the bound regions, with the downstream sequence analysis thereby boosting the sensitivity and specificity of the motif finding task. (4) Implementation of the statistical methods developed as part of this research in statistical packages. The resulting packages will be available to the scientific community both in stand-alone versions and as part of the Bioconductor Project which is an open source and development software project for the analysis of the genomic data. Successful completion of the proposed research will result in substantially improved statistical methods for the analysis of ChlP-chip experiments.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Good, Peter J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Wisconsin Madison
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Shin, Sunyoung; Kele?, Sündüz (2017) Annotation Regression for Genome-Wide Association Studies with an Application to Psychiatric Genomic Consortium Data. Stat Biosci 9:50-72
Welch, Rene; Chung, Dongjun; Grass, Jeffrey et al. (2017) Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments. Nucleic Acids Res 45:e145
Papale, Ligia A; Li, Sisi; Madrid, Andy et al. (2016) Sex-specific hippocampal 5-hydroxymethylcytosine is disrupted in response to acute stress. Neurobiol Dis 96:54-66
Li, Sisi; Papale, Ligia A; Zhang, Qi et al. (2016) Genome-wide alterations in hippocampal 5-hydroxymethylcytosine links plasticity genes to acute stress. Neurobiol Dis 86:99-108
Yao, Chen; Chen, Brian H; Joehanes, Roby et al. (2015) Integromic analysis of genetic variation and gene expression identifies networks for cardiovascular disease phenotypes. Circulation 131:536-49
Papale, Ligia A; Zhang, Qi; Li, Sisi et al. (2015) Genome-wide disruption of 5-hydroxymethylcytosine in a mouse model of autism. Hum Mol Genet 24:7121-31
Marty, Amber J; Broman, Aimee T; Zarnowski, Robert et al. (2015) Fungal Morphology, Iron Homeostasis, and Lipid Metabolism Regulated by a GATA Transcription Factor in Blastomyces dermatitidis. PLoS Pathog 11:e1004959
Zeng, Xin; Li, Bo; Welch, Rene et al. (2015) Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping. PLoS Comput Biol 11:e1004491
Hewitt, Kyle J; Kim, Duk Hyoung; Devadas, Prithvia et al. (2015) Hematopoietic Signaling Mechanism Revealed from a Stem/Progenitor Cell Cistrome. Mol Cell 59:62-74
Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan et al. (2015) Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data. Cancer Inform 13:123-31

Showing the most recent 10 out of 46 publications