Changes in patterns of DNA methylation, a modulator of gene expression, play a key role in development and disease. There has been a flurry of development of new experimental methods to accurately map 5- methylcytosine and 5-hydroxymethylcytosine genome-wide. However, development of computational techniques to interpret these data has lagged behind. Current analytical methods are mostly concerned with visualization of genome-level correlations between DNA methylation and other epigenetic marks or with the identification of differentially methylated regions between samples. These tools have performed poorly when trying to link methylation and expression changes at specific loci, and they find quite weak genome-wide correlations between them. We hypothesize that such discrepancies are due to limitations of the current analysis methods, which tend to oversimplify local DNA methylation signals. We propose to develop a new suite of tools using methods from computer vision to associate spatially-similar methylation changes with corresponding changes in transcription. This approach allows us to make minimal assumptions about what these signals should look like. First, we use Dynamic Time Warping (DTW), a curve similarity metric, to cluster genes together based on their methylation signals. We then find clusters of genes with similar methylation patterns that have coordinated differential expression. We export the patterns defined by these curves for use as a classifier, enabling us to enumerate genes with associated methylation and expression changes. Our model considers the entire shape and course of the signal around the gene promoter, rather than averaging the signal across windows or modeling signals with idealized statistical distributions. We recently published how our method discovered a variety of DNA methylation patterns associated with gene silencing, and produced longer and markedly higher quality gene lists than those generated by other methods. In this proposal we build on these results to address three aims concerning DNA methylation's role in gene silencing: (1) We will build a classifier to model the methylation patterns we discover and use it to examine the role of DNA hypomethylation, which occurs primarily at alternative and retrotransposon promoters that are poorly annotated, in transcriptome regulation in breast cancer. (2) We will expand our model to address the controversial role of 5-hydroxymethylcytosine in gene regulation and its interplay with 5-methylcytosine in differentiated cells in the central nervous system. (3) We will develop general rules governing the methylation signatures we have identified and experimentally validate their functionality. This proposal will yield a set of experimentally validated models for how DNA methylation contributes to gene silencing as well as a set of computational tools other researchers can use to analyze the role of methylation in human disease.
DNA methylation is an important regulator of gene expression. Abnormal patterns of methylation are often found in cancer and other human diseases. We will develop a set of computational tools, inspired by methods from computer vision, to identify DNA methylation patterns associated with either gene silencing or activation from genomic data. We will elucidate and experimentally validate general rules governing these relationships. All software will be made publically available to broadly advance cancer-research nationwide. Insights obtained through this work will improve our mechanistic understanding of DNA methylation and assist in the identification of patients who can benefit from DNA methylation targeted therapies.