A grand challenge of genome research is to use data from DNA proximity ligation methods (such as Hi-C) together with other available data to create accurate, predictive models of the 3D nuclear organization of the genome and reveal the functional implications of different genome organizations. The limitations and incompleteness of the data make this a challenging task. For instance, Hi-C data describe only the average genome conformation over a large ensemble of cells, but the spatial genome organization is highly dynamic and variable among individual cells in the same sample. Moreover, the data cannot reveal any higher order information such as co-occurrences of interactions in the same cell. Even single-cell Hi-C approaches are hampered by low interaction coverage per cell and limited sampling with statistical relevance across the large conformational variability among genomes. In addition, the dynamic nature of the genome makes it very challenging to find a comprehensive description of genome structure/function relationships by mining functionally relevant structural chromatin patterns. Therefore, there is urgent demand for computational methods that can appropriately interpret Hi-C data for 3D genome modeling and analysis and integrate this data with any other available information about the genome organization, for example from imaging and other technologies. We propose a new population-based modeling approach, which reframes the problem of optimizing a genome structure population as a maximum a posteriori probability estimation problem. Our method can deconvolute ensemble-based Hi-C data into a population of genome structures that are altogether statistically consistent with the input data and describe the best approximation of the true genome structure population given the available data. Our probabilistic approach provides a framework for comprehensive integration of all available data, including ensemble-average and single-cell Hi-C data, as well as other experimental data sources (e.g. imaging), to increase the coverage, accuracy and resolution of the predictive genome models. We also develop a graph mining approach for chromatin pattern discovery in an ensemble of genome structures and relate these patterns to a variety of nuclear processes, such as transcription, translocation, and DNA replication.

Agency
National Institute of Health (NIH)
Institute
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)
Type
Specialized Center--Cooperative Agreements (U54)
Project #
5U54DK107981-05
Application #
9783782
Study Section
Special Emphasis Panel (ZRG1)
Project Start
Project End
Budget Start
2019-08-01
Budget End
2020-07-31
Support Year
5
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90089
Kim, Seung Joong; Fernandez-Martinez, Javier; Nudelman, Ilona et al. (2018) Integrative structure and functional anatomy of a nuclear pore complex. Nature 555:475-482
Caridi, Christopher P; Delabaere, Laetitia; Tjong, Harianto et al. (2018) Quantitative Methods to Investigate the 4D Dynamics of Heterochromatic Repair Sites in Drosophila Cells. Methods Enzymol 601:359-389
Hua, Nan; Tjong, Harianto; Shin, Hanjun et al. (2018) Producing genome structure populations with the dynamic and automated PGS software. Nat Protoc 13:915-926
Dultz, Elisa; Mancini, Roberta; Polles, Guido et al. (2018) Quantitative imaging of chromatin decompaction in living cells. Mol Biol Cell 29:1763-1777
Zhu, Yina; Gong, Ke; Denholtz, Matthew et al. (2017) Comprehensive characterization of neutrophil genome topology. Genes Dev 31:141-153
Joseph, Agnel Praveen; Polles, Guido; Alber, Frank et al. (2017) Integrative modelling of cellular assemblies. Curr Opin Struct Biol 46:102-109
Li, Qingjiao; Tjong, Harianto; Li, Xiao et al. (2017) The three-dimensional genome organization of Drosophila melanogaster through data integration. Genome Biol 18:145
Dai, Chao; Li, Wenyuan; Tjong, Harianto et al. (2016) Mining 3D genome structure populations identifies major factors governing the stability of regulatory communities. Nat Commun 7:11549
Tjong, Harianto; Li, Wenyuan; Kalhor, Reza et al. (2016) Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. Proc Natl Acad Sci U S A 113:E1663-72
Shin, Hanjun; Shi, Yi; Dai, Chao et al. (2016) TopDom: an efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Res 44:e70

Showing the most recent 10 out of 11 publications