Large numbers of complete methylomes are being acquired through clinical sequencing projects, such as through The Cancer Genome Atlas, Blueprint Epigenome Project, and International Cancer Genome Consortium. Furthermore, third-generation nanopore sequencers, which detect DNA methylation and genetic variation in a single experiment, are nearly ready for routine clinical sequencing and will provide complete methylomes for all patients where whole-genome sequencing is indicated. Current analysis tools however only perform preliminary methylome processing and catalogue differentially methylated regions (DMRs). In order to transform methylome analysis into a clinically useful diagnostic/prognostic test, we need to develop predictive tools to interpret the functional and pathological consequences of identified methylation changes. Towards this goal, we have published a series of papers demonstrating that machine-learning based models utilizing high- resolution signatures of all methylation changes around a promoter vastly outperform conventional DMR methods. Our models accurately predict expression states at genes potentially regulated by methylation and reveal predictive methylation signatures that facilitate mechanistic interpretation. Nonetheless, several challenges remain before we can achieve our goals of translating genome-wide methylation data for routine clinical use: (1) To our knowledge, no current models integrate distal enhancers, whose activation is affected by DNA methylation. Such integrative analysis is necessary to understand consequences of methylation changes in cancers, whose genomes frequently undergo wide-spread methylation changes. In addition, such modelling will be essential to understand the role of 5-hydroxymethylcytosine (5hmC), which may play both repressive and activating roles in neurons depending on whether it is found at promoters or enhancers. (2) Our current models (and conventional approaches) represent methylation data independent of DNA sequence despite mechanistic studies demonstrating that methylation changes can have different functional effects depending on which sequences change and depending on the context of the local regulatory grammar. In this proposal, we will meet these challenges by first developing a predictive model that incorporates 5-methylcytosine and 5hmC at promoters and enhancers to determine how these marks act in concert. In particular, we will examine the hypothesized dual role of 5hmC as a repressor at promoters and as an activator at enhancers in cortical neurons. We will then use new advances in natural language processing to model DNA sequence and methylation to predict expression states. Our results will reveal which regulatory elements and transcription factors binding sites are affected by DNA methylation and how changes at different sites collaborate to affect expression changes. We will experimentally validate our in silico predictions using a combination of reporter assays and CRISPR- based epigenome-editing tools. Thus, the software tools we develop will form an important toolkit for the analysis and mechanistic interpretation of whole-genome methylation studies, both in the laboratory and clinic.
Public Narrative DNA normally undergoes a variety of chemical modifications including the addition of methyl- and hydroxymethyl- groups to cytosines (DNA methylation). Though we are unsure how to interpret them, abnormal patterns of DNA methylation are often found in human diseases and will likely soon be routinely measured in the clinic. In this proposal, we will develop advanced computational approaches to predict the functional and pathological consequences of DNA methylation changes in disease. In the future, the approaches we develop will lead to software that suggests individualized therapies for patients based on clinical methylation data.