Statistical and Computational Tools for Next-generation ChIP-seq Applications

Ji, Hongkai

Abstract

ChIP-seq is a powerful technology to map genome-wide protein-DNA interactions (PDIs). It is increasingly used by scientists worldwide to study how gene activities are controlled in normal cells and why they are disrupted in diseases. Applying ChIP-seq to study gene regulation faces three major challenges: (1) how to analyze large ChIP-seq data sets to discover dynamic changes of gene regulation across different biological contexts, (2) how to infer global regulatory programs under the practical constraint that it is not feasible to conduct ChIP-seq for all transcription factors (TFs), and (3) how to analyze allele-specific events given the small amount of data at heterozygote SNPs which cause low statistical power. This study investigates novel statistical and computational solutions to address the challenges above. First, a new method will be developed to discover and characterize dynamic changes of gene regulation across different biological contexts. This method, Generalized Differential Principal Component Analysis (dPCA/GDPCA), integrates unsupervised pattern discovery, dimension reduction and statistical inference into a single statistical framework. It provides a systematic solution to analyze quantitative and curve shape changes in large ChIP-seq data sets involving multiple proteins. It is expected to have a wide range of applications. Second, a computational framework will be developed to predict global gene regulation dynamics, i.e., dynamic changes of downstream regulatory events of all TFs for which DNA binding motif information is available. The analysis integrates the dynamic changes of histone modification ChIP-seq, DNase-seq, and FAIRE-seq data with DNA sequences, public ChIP-seq, and public gene expression data. It will provide a practical, affordable, and reasonably accurate solution to utilizing ChIP-seq to study many TFs simultaneously. A systematic benchmark study will also be con- ducted to evaluate the impact of technologies, data types and analytical methods on prediction performance. This benchmark study will provide guidelines for designing informative future experiments. Third, a method for detecting allele-specific protein-DNA binding (ASB) will be developed. The method is able to integrate information from multiple ChIP-seq data sets and completely phased genome sequences to significantly improve the statistical power of ASB inference. Various sources of biases will also be handled. Guidelines and new analytical tools generated by this study will allow one to design informative ChIP-seq experiments in the future such that by collecting one set of ChIP-seq data, one can not only identify locations of PDIs, but also infer global dynamic changes of TF binding sites across different biological contexts, and, if genotype data are available, robustly analyze allele-specific gene regulation. This will make ChIP-seq a low-cost high-reward experiment that serves multiple purposes. By significantly expanding the utility and increasing the power of ChIP-seq, our computational infrastructure is expected to have a major impact on advancing future studies of gene regulation and dissections of regulatory mechanisms behind human diseases.

Public Health Relevance

ChIP-seq is a powerful technology to analyze how genes'activities are controlled in normal cells and diseases. This proposal develops statistical and computational tools urgently needed by scientists to analyze large and complex ChIP-seq data sets. By allowing one to examine dynamic changes of global gene regulatory programs across different biological contexts, the new computational technologies developed in this proposal are expected to have a major impact on advancing future studies of regulatory mechanisms behind human diseases.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG006841-02
Application #: 8543753
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Pazin, Michael J

Project Start: 2012-09-12
Project End: 2015-05-31
Budget Start: 2013-06-01
Budget End: 2014-05-31
Support Year: 2
Fiscal Year: 2013
Total Cost: $309,420
Indirect Cost: $100,876

Institution

Name: Johns Hopkins University
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 001910777

City: Baltimore
State: MD
Country: United States
Zip Code: 21218

Related projects


NIH 2014 R01 HG	Statistical and Computational Tools for Next-generation ChIP-seq Applications Ji, Hongkai / Johns Hopkins University	$317,520
NIH 2013 R01 HG	Statistical and Computational Tools for Next-generation ChIP-seq Applications Ji, Hongkai / Johns Hopkins University	$309,420
NIH 2012 R01 HG	Statistical and Computational Tools for Next-generation ChIP-seq Applications Ji, Hongkai / Johns Hopkins University	$324,000

Publications

Kuang, Zheng; Ji, Zhicheng; Boeke, Jef D et al. (2018) Dynamic motif occupancy (DynaMO) analysis identifies transcription factors and their binding sites driving dynamic biological processes. Nucleic Acids Res 46:e2

Fan, Jianqing; Liu, Han; Sun, Qiang et al. (2018) I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR. Ann Stat 46:814-841

Zhou, Weiqiang; Sherwood, Ben; Ji, Zhicheng et al. (2017) Genome-wide prediction of DNase I hypersensitivity using gene expression. Nat Commun 8:1038

Kuang, Zheng; Pinglay, Sudarshan; Ji, Hongkai et al. (2017) Msn2/4 regulate expression of glycolytic enzymes and control transition from quiescence to growth. Elife 6:

Ji, Zhicheng; Zhou, Weiqiang; Ji, Hongkai (2017) Single-cell regulome data analysis by SCRAT. Bioinformatics 33:2930-2932

Zhao, Tuo; Liu, Han (2016) Accelerated Path-following Iterative Shrinkage Thresholding Algorithm with Application to Semiparametric Graph Estimation. J Comput Graph Stat 25:1272-1296

Zhou, Weiqiang; Sherwood, Ben; Ji, Hongkai (2016) Computational Prediction of the Global Functional Genomic Landscape: Applications, Methods, and Challenges. Hum Hered 81:88-105

Zhao, Tianqi; Cheng, Guang; Liu, Han (2016) A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA. Ann Stat 44:1400-1437

Kang, Jian; Bowman, F DuBois; Mayberg, Helen et al. (2016) A depression network of functionally connected regions discovered via multi-attribute canonical correlation graphs. Neuroimage 141:431-441

Wamaitha, Sissy E; del Valle, Ignacio; Cho, Lily T Y et al. (2015) Gata6 potently initiates reprograming of pluripotent and differentiated cells to extraembryonic endoderm stem cells. Genes Dev 29:1239-55

Showing the most recent 10 out of 22 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: