The information content of DNA is not limited to the primary sequence (A, C, G, T), but is also conveyed by chemical modifications of individual bases. For example, DNA methylation, specifically 5-methylcytosine (5mC), has been widely studied for its important regulatory roles in human development and diseases. In addition, the discovery of active demethylation of 5mC, mediated by TET enzymes, into 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) revealed great insights into the dynamic nature of the human methylome and its close relevance to multiple human diseases. Beyond these chemical modifications to cytosine, recent studies by us and others discovered that N6-methyladenine (6mA), another form of methylation previously thought exclusively existing in bacteria and protozoa, also exists in eukaryotic genomes including the human genome. In addition to these epigenetic marks, different forms of DNA damages represent another category of DNA chemical modifications that are of important biological relevance. Although a few methods for mapping individual chemical modifications have been developed and some are widely used, it is usually hard for broad researchers to master every protocol to map each form of modification. While third- generation sequencing technologies support the direct detection of DNA modifications, they face fundamental challenges distinguishing among different forms of modifications. The objective of this project is to develop a novel technology for the direct mapping of multiple forms of DNA methylation and DNA damage events simultaneously. The core idea is that each form of nucleic acid modification has a unique signature in terms of their physical interaction with DNA polymerase, or nanopores in third-generation sequencing; and these signatures can be modeled by deep learning methods. We will develop this technology using multiple innovative strategies to address a few fundamental challenges, and then comprehensively evaluate the technology to facilitate broad applications.
Chemical DNA modifications are crucial components of human genome that controls many important biological processes in human development and human diseases. In this project, we will develop a novel technology for direct mapping of multiple and specific forms of DNA modifications, which will enable us and a large number of researchers to more effectively study the functions of DNA modifications in human genome. !