Recent advances in next generation?sequencing (NGS)-based molecular methods have illuminated the hierarchical organization of the genome and have shown that changes in the epigenome can promote or prevent the access of transcription factors (TFs) to specific DNA sequences, move genes between nuclear compartments, and build or remove the insulation between neighboring genomic regions. As changes in the epigenome and chromatin organization can derail precise transcriptional regulatory programs to change cell differentiation status or induce a pathological state, research in Dr. Li?s laboratory seeks to improve our ability to define and understand the impact of such changes across multiple layers of transcriptional regulation in the cell. The laboratory has effectively addressed the regulatory roles of DNA methylation in its previous and ongoing work and now extends its focus to hydroxymethylation. 5-hydroxymethylcytosine (5hmC), is a key epigenetic modification linked to transcriptional activation; however, 5hmC data and its genome properties have thus far been evaluated with limited integration of different genomic data types. Moreover, there is no integrative computational framework designed to interpret the functional role of 5hmC in the context of 5-methycytosine (5mC), enhancer activities, chromatin interactions, gene expression data, and DNA sequence information. This proposal will fill the growing need for user-friendly, interpretable, and extendable tools for mining 5hmC data toward laying a foundation for basic mechanistic studies of the epigenome and facilitate discovery of potential therapeutic targets in disease. Building on the investigator?s progress in revealing the dynamics of 5hmC and its impact on gene regulation, the proposal will now develop innovative computational tools for 5hmC data mining and data integration with other NGS datasets, with a focus on applying these tools to B cell differentiation, cancer, and embryonic stem cell (ESC) differentiation. Key goals over the next five years include developing a computational framework to mine short- and long-read sequencing data to answer the following questions: (1) How does 5hmC contribute to epigenetic heterogeneity? (2) How does 5hmC epigenetic heterogeneity contribute to transcriptome heterogeneity? (3) How do 5hmC levels and epigenetic heterogeneity communicate with histone modifications, enhancer activities, chromatin interactions, and chromatin organization? We will combine machine learning and network mining algorithms to enable knowledge discovery and data integration from diverse genomic data types. We will then harness the 5hmC data-mining framework to identify 5hmC patterns that correlate with ESC differentiation, B cell differentiation, and that contribute to the fitness advantage of cancer cells. This work is significant because it will be the first dissection of 5hmC?s contribution to local and long-range epigenetic heterogeneity and the first computational framework to uncover the cross-talk between DNA modifications and other transcriptional regulators via chromatin interaction data. Collectively, this work will yield a fuller picture of the molecular events that underlie fundamental changes in cell state and behavior.
/ RELEVANCE TO PUBLIC HEALTH Modifications of the genome that do not change the sequence of the DNA, called epigenetic modifications, can impact how genes are expressed and lead healthy cells into a disease state. Here we propose to develop computational tools to define how one type of epigenetic modification, called hydroxymethylation, impacts gene expression and ultimately cell behavior. These tools will be versatile and adaptable to any cell type of choice, enabling research into the molecular features of the genome that underlie both normal and pathological cell behaviors.