Statistical methods for improving reproducibility and utility of chromatin interaction data

Li, Qunhua

Abstract

The spatial organization of the genome in the nucleus plays an important role in the transcriptional control of genes. Currently, Hi-C is the most widely used high-throughput technique that probes the genome-wide spatial organization of chromatin. However, Hi-C experiments involve multiple complex experimental steps, introducing various sources of biases. Many data-analytical challenges still must be overcome to reach reliable and reproducible biological interpretations of the data. The small sample size of each individual study further limits the power and reliability of data analyses. When replicate samples are available, reproducibility across replicate samples informs us about the fidelity of the identification, and potentially it can be used to detect reproducible signals that are too modest to be detected reliably in individual samples. Even for samples from different cells, information may be borrowed through joint analyses to improve the identification of both topologically associated domains (TADs) and regions with different structures. This project proposes to develop a suite of new statistical methods that use the reproducibility information provided by replicate samples to select reliable identifications and to improve the accuracy of peak calling and TAD calling. Furthermore, it proposes a joint analysis framework to identify condition-specific architectural differences across different cells.
Aim 1 will develop statistical methods to evaluate the reproducibility of identified chromatin loops and to select reproducible identifications. The reproducibility-based selection criterion complements the usual measure of significance on a single sample, but has the benefit of being comparable across data sets, protocols and different measures of significance.
Aim 2 will develop robust, joint multi-sample peak calling and TAD calling methods. These methods will allow one to synergize information across samples and properly take account of variations across replicates, ultimately improving the power of the analysis and reducing false positives.
Aim 3 will develop statistical methods for detecting TAD and other architectural differences between different cell types, cellular conditions, or disease status. Included in each proposed Aim are rigorous evaluations of the output of these methods utilizing orthogonal epigenomic data and experimental tests of hypotheses derived from the results of the analytical methods. These methods will enable users to generate reliable and robust scientific interpretation, and ultimately advance the understanding of nuclear organization and its role in gene expression and cellular function.

Public Health Relevance

The 3D genome organization plays an important role in regulating gene expression, and alteration in 3D architectures can lead to cancer or other diseases. Hi-C data provide a genome-wide view to study genome architectures, but many challenges still remain in its data analysis. Our proposed statistical tools will improve the reliability of interpretations derived from Hi-C data, and hence will increase the robustness of the molecular understanding of human diseases.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM109453-06
Application #: 9899253
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Brazhnik, Paul

Project Start: 2013-09-01
Project End: 2023-03-31
Budget Start: 2020-04-01
Budget End: 2021-03-31
Support Year: 6
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Pennsylvania State University
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 003403953

City: University Park
State: PA
Country: United States
Zip Code: 16802

Related projects


NIH 2020 R01 GM	Statistical methods for improving reproducibility and utility of chromatin interaction data Li, Qunhua / Pennsylvania State University
NIH 2019 R01 GM	Statistical methods for improving reproducibility and utility of chromatin interaction data Li, Qunhua / Pennsylvania State University
NIH 2016 R01 GM	Statistical methods for improving reproducibility and utility of sequencing data Li, Qunhua / Pennsylvania State University
NIH 2015 R01 GM	Statistical methods for improving reproducibility and utility of sequencing data Li, Qunhua / Pennsylvania State University
NIH 2014 R01 GM	Statistical methods for improving reproducibility and utility of sequencing data Li, Qunhua / Pennsylvania State University
NIH 2013 R01 GM	Statistical methods for improving reproducibility and utility of sequencing data Li, Qunhua / Pennsylvania State University	$271,736

Publications

Li, Qunhua; Zhang, Feipeng (2018) A regression framework for assessing covariate effects on the reproducibility of high-throughput experiments. Biometrics 74:803-813

Yang, Tao; Zhang, Feipeng; Yard?mc?, Galip Gürkan et al. (2017) HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res 27:1939-1949

Zhang, Feipeng; Li, Qunhua (2017) Robust bent line regression. J Stat Plan Inference 185:41-55

Zhang, Feipeng; Li, Qunhua (2017) A Continuous Threshold Expectile Model. Comput Stat Data Anal 116:49-66

Charepalli, Venkata; Reddivari, Lavanya; Radhakrishnan, Sridhar et al. (2017) Pigs, Unlike Mice, Have Two Distinct Colonic Stem Cell Populations Similar to Humans That Respond to High-Calorie Diet prior to Insulin Resistance. Cancer Prev Res (Phila) 10:442-450

Song, C; Pan, X; Ge, Z et al. (2016) Epigenetic regulation of gene expression by Ikaros, HDAC1 and Casein Kinase II in leukemia. Leukemia 30:1436-40

Lyu, Yafei; Li, Qunhua (2016) A semi-parametric statistical model for integrating gene expression profiles across different platforms. BMC Bioinformatics 17 Suppl 1:5

Bailey, Timothy; Krajewski, Pawel; Ladunga, Istvan et al. (2013) Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput Biol 9:e1003326

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: