Epigenetic regulation of gene expression through variation in DNA methylation plays a critical role in a range of biological processes including cellular differentiation, human disease and cancer. New methods for determining the fine structure of methylation patterns genome-wide have led to an explosion of research in this field. Methylation near the gene promoter is correlated with gene silencing, while unmethylated promoters are potentially active. Current computational methods classify genes as either methylated and silenced or unmethylated and potentially active based on a coarse calculation of CpG methylation state across a window of a few hundred base pairs around the transcription start site. The detailed spatial pattern of methylation across the promoter may be an important determinant of gene expression, but current computational methods ignore this valuable information. Recent work has found regions proximal to CpG island promoters, dubbed """"""""CpG island shores"""""""", whose methylation state correlates with transcription. CpG island shores are loosely defined, however, and this concept is difficult to apply in practice. While other methylation signatures are also likely to correlate with gene expression, a general framework to identify and study them has not been established. New computational tools for identification and correlation of detailed methylation patterns with gene expression are critical for advancing our understanding of the epigenetics of gene regulation. We propose to develop software tools to detect new methylation signatures at gene promoters that correlate with expression. This will be done within a formal framework so that these signatures can be used to determine differentially methylated genes in a variety of study designs. Taking advantage of the fact that methylation over short ranges are highly correlated, we will interpolate methylation data at individual CpG sites in a 10 kb window around the TSS to yield a methylation signature at each promoter that is independent of primary sequence features. We will then use a metric from topology, the discrete Frechet distance, to calculate the similarity between methylation signatures, and apply this metric to cluster signatures of similar type. We will then determine clusters with methylation signatures that correlate with expression. Clusters that contain both silenced and expressed genes will be examined to see if other local primary sequence features can be used to discriminate the genes based on expression. This approach will be used to detect methylation changes in case-control studies and in pairwise comparisons, such as needed for timecourse analysis or for the detection of different states of differentiation. The general framework developed here can be expanded in the future to examine methylation signatures at enhancers and gene bodies, as well as to histone marks or other genomic signals that carry detailed spatial information.
DNA methylation, the addition of a methyl group to the cytosine in CpG dinucleotides, is an important epigenetic regulator of gene expression in cells. DNA methylation has been implicated in a large number of biological processes including cellular differentiation and cancer. We will develop software tools to analyze data from new techniques for mapping DNA methylation genome-wide to discover new methylation signatures that are associated with silenced and active genes. These will include tools to use these signatures to determine what genes are differentially regulated in different cell types and in human disease.