The advent of high throughput next generation sequencing (NGS) technologies have revolutionized the fields of genetics and genomics by allowing rapid and inexpensive sequencing of billions of bases. Among the NGS applications, ChIP-seq (chromatin immunoprecipitation followed by NGS) is perhaps the most successful to date. ChIP-seq technology enables investigators to study genome-wide binding of transcription factors and mapping of epigenomic marks. Both of these play crucial roles in programming of cell specific gene expression;therefore their genome-wide mapping can significantly advance our ability to understand and diagnose human diseases. Although basic analysis tools for ChIP-seq data are rapidly increasing, there has not been much progress on the design problems regarding ChIP-seq experiments. A challenging question that the researchers planning a ChIP-seq experiment need to answer is: how deeply should the ChIP and the control samples be sequenced? The answer depends on multiple factors some of which can be set by the experimenter based on pilot/preliminary data. The sequencing depth of a ChIP-seq experiment is one of the key factors that determine whether or not all the underlying targets (e.g., binding locations or epigenomic profiles) can be identified with a targeted power. This is especially important when the goal is the analysis of individual-to-individual and allele specific variation o transcription factor binding and epigenomic profiles. Insufficient sequencing depths may lead to spurious differences in binding or epigenome profiles. In this proposal, we aim to develop a general framework for power calculations in ChIP-seq experiments with three specific aims and by considering statistical models commonly used in ChIP-seq analysis: (1) Power calculations based on the conditional Binomial model;(2) Power calculations based on the Poisson and Negative Binomial regression models;(3) A power calculation tool for GALAXY and Bioconductor. This project will be accomplished through a combination of theoretical/methodological development, simulation, computational analysis, and experimental validation. Methods will be developed and evaluated using datasets from the ENCODE, modENCODE, and the RoadMap Epigenomics consortiums as well as novel datasets from collaborators. Statistical resources generated from the project, which will be disseminated in publicly available software, will provide essential tools for the efficient design of ChIP-seq experiments.

Public Health Relevance

The proposed research is relevant to public health because capturing genome-wide binding of transcription factors and epigenomic information by ChIP-seq technology is invaluable for comprehensively understanding development, differentiation, and disease. Design of ChIP-seq experiments present unprecedented challenges. We will develop a statistical framework for power calculations in designing ChIP-seq experiments and disseminate results and software to the research community.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21HG006716-01
Application #
8284083
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pazin, Michael J
Project Start
2012-05-01
Project End
2014-03-31
Budget Start
2012-05-01
Budget End
2013-03-31
Support Year
1
Fiscal Year
2012
Total Cost
$184,085
Indirect Cost
$59,085
Name
University of Wisconsin Madison
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
161202122
City
Madison
State
WI
Country
United States
Zip Code
53715
Sun, Guannan; Srinivasan, Rajini; Lopez-Anido, Camila et al. (2014) In silico pooling of ChIP-seq control experiments. PLoS One 9:e109691
Zuo, Chandler; Keles, Sunduz (2014) A statistical framework for power calculations in ChIP-seq experiments. Bioinformatics 30:753-60
Chung, Dongjun; Park, Dan; Myers, Kevin et al. (2013) dPeak: high resolution identification of transcription factor binding sites from PET and SET ChIP-Seq data. PLoS Comput Biol 9:e1003246
Myers, Kevin S; Yan, Huihuang; Ong, Irene M et al. (2013) Genome-scale analysis of escherichia coli FNR reveals complex features of transcription factor binding. PLoS Genet 9:e1003565