Chromosome Conformation Capture (3C)-based technologies for detecting long-range DNA interactions have matured recently. Hi-C and its variants are currently the state-of the art for genome-wide mapping of long-range interactions. Although there is a fast growing literature on Hi-C data analysis methods, repeat sequences, which can make up a significant proportion of Hi-C reads, are excluded from these analyses. Discarding repetitive genomic content can result in the erroneous assignment of a regulatory element in a mapped region to a target that is not the bona fide target. Moreover, the regulatory elements residing in repetitive regions can control targets in mapped regions. To address these challenges, we will develop biologically motivated, statistically rigorous approaches for allocating multi-mapping reads in Hi-C data analysis.
The aims will be accomplished through a combination of methodological development, data-driven simulation, computational analysis, and experimental validation. Statistical resources generated from this project will be disseminated as open-source software. Collectively, these aims will significantly enhance the utility of Hi-C data for profiling long-range interactions of repetitive DNA.
Genome-wide data from chromosome conformation capture and variant experiments provide overwhelming evidence that the three-dimensional organization of chromatin impacts gene regulation and genome function. One critical shortcoming of existing analytic approaches for analyzing these data is that they discard reads that align to multiple locations on the genome. This project seeks to enhance our knowledge on long- range regulatory interactions involving repetitive genomic regions by incorporating such reads into analysis.