Statistical Power Calculations for ChIP-seq experiments

Keles, Sunduz

Abstract

The advent of high throughput next generation sequencing (NGS) technologies have revolutionized the fields of genetics and genomics by allowing rapid and inexpensive sequencing of billions of bases. Among the NGS applications, ChIP-seq (chromatin immunoprecipitation followed by NGS) is perhaps the most successful to date. ChIP-seq technology enables investigators to study genome-wide binding of transcription factors and mapping of epigenomic marks. Both of these play crucial roles in programming of cell specific gene expression;therefore their genome-wide mapping can significantly advance our ability to understand and diagnose human diseases. Although basic analysis tools for ChIP-seq data are rapidly increasing, there has not been much progress on the design problems regarding ChIP-seq experiments. A challenging question that the researchers planning a ChIP-seq experiment need to answer is: how deeply should the ChIP and the control samples be sequenced? The answer depends on multiple factors some of which can be set by the experimenter based on pilot/preliminary data. The sequencing depth of a ChIP-seq experiment is one of the key factors that determine whether or not all the underlying targets (e.g., binding locations or epigenomic profiles) can be identified with a targeted power. This is especially important when the goal is the analysis of individual-to-individual and allele specific variation o transcription factor binding and epigenomic profiles. Insufficient sequencing depths may lead to spurious differences in binding or epigenome profiles. In this proposal, we aim to develop a general framework for power calculations in ChIP-seq experiments with three specific aims and by considering statistical models commonly used in ChIP-seq analysis: (1) Power calculations based on the conditional Binomial model;(2) Power calculations based on the Poisson and Negative Binomial regression models;(3) A power calculation tool for GALAXY and Bioconductor. This project will be accomplished through a combination of theoretical/methodological development, simulation, computational analysis, and experimental validation. Methods will be developed and evaluated using datasets from the ENCODE, modENCODE, and the RoadMap Epigenomics consortiums as well as novel datasets from collaborators. Statistical resources generated from the project, which will be disseminated in publicly available software, will provide essential tools for the efficient design of ChIP-seq experiments.

Public Health Relevance

The proposed research is relevant to public health because capturing genome-wide binding of transcription factors and epigenomic information by ChIP-seq technology is invaluable for comprehensively understanding development, differentiation, and disease. Design of ChIP-seq experiments present unprecedented challenges. We will develop a statistical framework for power calculations in designing ChIP-seq experiments and disseminate results and software to the research community.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Exploratory/Developmental Grants (R21)
Project #: 5R21HG006716-02
Application #: 8463012
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Pazin, Michael J

Project Start: 2012-05-01
Project End: 2014-03-31
Budget Start: 2013-04-01
Budget End: 2014-03-31
Support Year: 2
Fiscal Year: 2013
Total Cost: $184,085
Indirect Cost: $59,085

Institution

Name: University of Wisconsin Madison
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 161202122

City: Madison
State: WI
Country: United States
Zip Code: 53715

Related projects


NIH 2013 R21 HG	Statistical Power Calculations for ChIP-seq experiments Keles, Sunduz / University of Wisconsin Madison	$184,085
NIH 2012 R21 HG	Statistical Power Calculations for ChIP-seq experiments Keles, Sunduz / University of Wisconsin Madison	$184,085

Publications

Zuo, Chandler; Chen, Kailei; Hewitt, Kyle J et al. (2016) A Hierarchical Framework for State-Space Matrix Inference and Clustering. Ann Appl Stat 10:1348-1372

Sun, Guannan; Srinivasan, Rajini; Lopez-Anido, Camila et al. (2014) In silico pooling of ChIP-seq control experiments. PLoS One 9:e109691

Zuo, Chandler; Kele?, Sündüz (2014) A statistical framework for power calculations in ChIP-seq experiments. Bioinformatics 30:753-60

Myers, Kevin S; Yan, Huihuang; Ong, Irene M et al. (2013) Genome-scale analysis of escherichia coli FNR reveals complex features of transcription factor binding. PLoS Genet 9:e1003565

Zeng, Xin; Sanalkumar, Rajendran; Bresnick, Emery H et al. (2013) jMOSAiCS: joint analysis of multiple ChIP-seq datasets. Genome Biol 14:R38

Chung, Dongjun; Park, Dan; Myers, Kevin et al. (2013) dPeak: high resolution identification of transcription factor binding sites from PET and SET ChIP-Seq data. PLoS Comput Biol 9:e1003246

Liang, Kun; Keles, Sunduz (2012) Normalization of ChIP-seq data with control. BMC Bioinformatics 13:199

Johnson, Kirby D; Hsu, Amy P; Ryu, Myung-Jeom et al. (2012) Cis-element mutated in GATA2-dependent immunodeficiency governs hematopoiesis and vascular integrity. J Clin Invest 122:3692-704

Comments

Be the first to comment on Sunduz Keles's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: