Statistical Analysis Methods and Software for ChIP-seq Data

Keles, Sunduz

Abstract

The advent of high throughput next generation sequencing (NGS) technologies have revolutionized the fields of genetics and genomics by allowing rapid and inexpensive sequencing of billions of bases. Among the NGS applications, ChIP-seq (chromatin immunoprecipitation followed by NGS) is perhaps the most successful to date. ChIP-seq technology enables investigators to study genome-wide binding of transcription factors and mapping of epigenomic marks. Both of these play crucial roles in programming of gene expression in a cell specific manner;therefore their genome-wide mapping can significantly advance our ability to understand and diagnose human diseases. Although basic analysis tools for ChIP-seq data are rapidly increasing, all of the available methods share one or more of the following shortcomings. First, they focus on analyzing one ChIP- seq sample at a time. As ChIP-seq is becoming commonly utilized in epigenome mapping to understand phenotypic variation, the demand for methods that can handle multiple samples efficiently is rapidly rising. Second, they only utilize sequence reads that align to unique locations on the reference genome. This hinders the study of highly repetitive regions of genomes by ChIP-seq. Third, commonly used designs for ChIP-seq experiments employ one matching control sample per each ChIP-seq sample. This limits the genome coverage of control experiments and impacts the detection of enrichment in ChIP samples. It also significantly contributes to increase in sequencing costs for large-scale ChIP-seq studies. The objective of this project is to address these challenges of ChIP-seq analysis in three specific aims: (1) Statistical methods for inference from multiple samples;(2) Probabilistic models for utilizing reads that map to multiple locations (multi-reads) in the genome;(3) Development and evaluation of in silico pooling designs for control experiments. The projects will be accomplished through a combination of methodological development, simulation, computational analysis, and experimental validation. Methods will be developed and evaluated using datasets from the ENCODE, modENCODE, and the RoadMap Epigenomics consortiums as well as novel datasets from collaborators. Statistical resources generated from the project, which will be disseminated in publicly available software, will provide essential tools for the efficient design and analysis of ChIP-seq experiments.

Public Health Relevance

The proposed research is relevant to public health because capturing genome- wide binding of transcription factors and epigenomic information by ChIP-seq technology is invaluable for comprehensively understanding development, differentiation, and disease. ChIP-seq experiments present unprecedented challenges in statistical analysis. We will develop statistical methods and tools for challenging aspects of ChIP-seq analysis and disseminate results and software to the research community.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 2R01HG003747-05A1
Application #: 8370723
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Pazin, Michael J

Project Start: 2005-07-01
Project End: 2015-12-31
Budget Start: 2013-01-17
Budget End: 2013-12-31
Support Year: 5
Fiscal Year: 2013
Total Cost: $295,243
Indirect Cost: $89,977

Institution

Name: University of Wisconsin Madison
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 161202122

City: Madison
State: WI
Country: United States
Zip Code: 53715

Related projects

Publications

Zhang, Qi; Keles, Sündüz (2018) An empirical Bayes test for allelic-imbalance detection in ChIP-seq. Biostatistics 19:546-561

Zuo, Chandler; Chen, Kailei; Kele?, Sündüz (2017) A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets. J Comput Biol 24:472-485

Kim, TaeWon; Havighurst, Thomas; Kim, KyungMann et al. (2017) RNA-Binding Protein IGF2BP1 in Cutaneous Squamous Cell Carcinoma. J Invest Dermatol 137:772-775

Otlu, Burçak; Firtina, Can; Keles, Sündüz et al. (2017) GLANET: genomic loci annotation and enrichment tool. Bioinformatics 33:2818-2828

Welch, Rene; Chung, Dongjun; Grass, Jeffrey et al. (2017) Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments. Nucleic Acids Res 45:e145

Shin, Sunyoung; Kele?, Sündüz (2017) Annotation Regression for Genome-Wide Association Studies with an Application to Psychiatric Genomic Consortium Data. Stat Biosci 9:50-72

Kreimer, Anat; Zeng, Haoyang; Edwards, Matthew D et al. (2017) Predicting gene expression in massively parallel reporter assays: A comparative study. Hum Mutat 38:1240-1250

Zuo, Chandler; Chen, Kailei; Hewitt, Kyle J et al. (2016) A Hierarchical Framework for State-Space Matrix Inference and Clustering. Ann Appl Stat 10:1348-1372

Papale, Ligia A; Li, Sisi; Madrid, Andy et al. (2016) Sex-specific hippocampal 5-hydroxymethylcytosine is disrupted in response to acute stress. Neurobiol Dis 96:54-66

Li, Sisi; Papale, Ligia A; Zhang, Qi et al. (2016) Genome-wide alterations in hippocampal 5-hydroxymethylcytosine links plasticity genes to acute stress. Neurobiol Dis 86:99-108

Showing the most recent 10 out of 51 publications

Comments

Be the first to comment on Sunduz Keles's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: