Despite many studies on the mechanisms of DNA double-strand breaks (DSB) formation, our knowledge of them is very incomplete. To date, DSB formation has been extensively studied only at specific loci but remains largely unexplored at the genome-wide level. This is owing to the lack of systematic, genome-wide studies to objectively test and compare the proposed mechanisms of DSB formation, as well as the lack of high- resolution genome-wide maps of DSBs obtained by direct DSB labeling to validate them. Working with collaborators, we have recently developed a method to label DSBs in situ followed by deep sequencing (BLESS), and used it to map DSBs in human cells with a resolution 2-3 orders of magnitude better than previously achieved. Our results show that hypothesis-driven analysis of high-resolution genomic regions identified by BLESS can help explore the basis of genomic instability genome-wide. We discovered that DSBs happen most often in regions that form DNA secondary structures or are highly transcribed. Both may cause collapse of the replication fork, eventually leading to DSBs - the former via fork stalling on DNA secondary structures, the latter because of replication-transcription collisions (RTCs) or formation of RNA-DNA hybrids (R-loops). We therefore hypothesize that the majority of the observed DSBs can be attributed to at least one of three main, non-mutually exclusive endogenous causes: collapse of fork due to 1) stalling on DNA secondary structures or 2) RTCs, or 3) co-transcriptional R-loop formation. We will test this hypothesis and clarify the relative importance of these mechanisms by pursuing three Specific Aims: 1) Quantify how fork stalling on DNA secondary structures impacts DSB formation;2) Estimate the contribution of RTCs to DSB formation;and 3) Clarify the influence of R-loops on DSB formation. The work proposed in this application is primarily computational. The main innovation of this project lies in developing predictive models that will provide the first comprehensive evaluation of the contributions of fork stalling, RTCs and R-loop formation to genomic instability in various conditions in human cells. To construct such models - and to gather the data both to inform and verify them - we will combine several cutting-edge computational and molecular biology methods. The computational methods will be mostly adapted from theoretical physics and experimental methods will include DNA combing, ChIP-Seq and novel DRIP-Seq method for R-loops detection in addition to our BLESS method. We expect that our research will reveal a complex and nuanced picture of the mechanisms and context of DSB formation in human cells and move the field from studying individual examples of DSBs to achieving a systematic, genome-wide understanding of DSB formation mechanisms, and quantification of their relative importance. Such progress should eventually allow use of DSB localization signatures for diagnostic and prognostic purposes. We will also provide powerful software tools, experimental methods and rich datasets for future studies going beyond the DNA repair and replication fields.
Francis Collins identified developing high-throughput technology as one of five areas of focus for NIH's research agenda, and one of the NHGRI's strategic goals is achieving maximal sequencing data accuracy as a prerequisite for clinical applications of sequencing methods. Our project addresses both of these goals: it takes a very promising, novel method for direct DSB detection in vivo and improves its accuracy through computational modeling, making it more useful for answering fundamental biological questions. This work will eventually enable the use of this method to generate high-resolution spatial genomic instability maps for diagnostic and prognostic purposes.