Hi-C is currently the most popular assay used to probe 3D chromatin organization within the cell genome-wide. Because loci far away in 1D genomic distance are often packed close together in 3D space, enhancer-promoter interactions can occur between distal regions of the genome. Importantly, this genome structure is well-conserved across cell types and even species, and dysregulation of this structure has been implicated as a source of aber- rant gene expression associated with diseases such as Alzheimer's, autoimmune disorders, and cancer. Thus, it is necessary that powerful methods be made available to pinpoint differential interactions between healthy and diseased cells in order to accurately identify new sources of pathogenesis and potential pathways for treatment. Analysis of Hi-C data is challenging because the unique spatial structure in the data, which implies both a 1D genomic distance dependence and a 3D spatial dependence, requires careful attention. Statistical tools that do not account for these dependencies suffer from reduced power to detect interactions, especially those between distal chromosomal regions. Further, methods for differential peak detection between a pair of Hi-C datasets are underdeveloped, and methods that scale to multiple joint comparisons are wholly missing. I propose to address these problems by developing a statistically rigorous methodology for detecting differential peaks in Hi-C data that both accounts for Hi-C's hallmark spatial dependence structure and scales to multiple joint comparisons across biological conditions (i.e. cell types, cell lineages, or experimental and control types). I hypothesize that this approach will greatly boost power to detect differential interactions in Hi-C samples. Moreover, with software made available to the public, scientists will be able to apply these tools to identify new drivers of pathogenesis, ultimately bene?tting human health. I will conduct this work under the close guidance of a sponsor and co- sponsor, respectively, with statistical and biological expertise with Hi-C data.

Public Health Relevance

It is of increasing interest to learn how the 3D organization of the genome affects transcriptional control within the cell using Hi-C data. However, analysis of this data is dif?cult, and the necessary computational tools are lacking. This proposal aims to develop statistical tools and computational software to interrogate differences in 3D genome architecture between various Hi-C samples in order to better characterize sources of heterogeneity in cells across different conditions.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Predoctoral Individual National Research Service Award (F31)
Project #
5F31HG010574-02
Application #
9904123
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Gatlin, Tina L
Project Start
2019-04-01
Project End
2021-03-31
Budget Start
2020-04-01
Budget End
2021-03-31
Support Year
2
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Pennsylvania State University
Department
Biochemistry
Type
Schools of Arts and Sciences
DUNS #
003403953
City
University Park
State
PA
Country
United States
Zip Code
16802