The primary objective of this grant is to develop and evaluate methods for the statistical analysis of DNA methylation data, with the ultimate goal of understanding the joint behavior of DNA methylation with genotype, copy number variation, and gene expression. A wide variety of technologies are available for studying DNA methylation (see Laird 2010 for a review). We focus on statistical method development for two different platforms provided by Illumina, Inc. All our aims are motivated by ongoing studies at the University of Southern California Epigenome Center. Specifically, we propose the following:
Specific Aim 1 : To develop and evaluate preprocessing methods for Illumina's Infinium HumanMethylation BeadArrays using technical replicates and mixed samples. a. To develop a fast Gamma-Gamma convolution model to correct for background fluorescence, and compare it with state-of-the art methods;b. To extend background correction methods to stratify by GC content;c. To provide code for data preprocessing in Bioconductor.
Specific Aim 2 : To develop and evaluate statistical tools for exploring condition-specific variation in DNA methylation. a. To develop novel filters to select loci for cluster analysis that consider the outcome, proportion DNA methylation, to follow a Beta distribution with variance a function of the mean;b. To develop a method for differential methylation detection using spatial smoothing and the fused lasso.
Specific Aim 3 : To develop and evaluate methods for processing whole-genome bisulfite-seq data. a. To develop and evaluate a novel model-based SNP genotype caller for bisulfite sequence data. This tool will simultaneously extract DNA methylation content for downstream analysis;b. To calibrate our model for known biases in bisulfite conversion and sequencing errors using control data sets of in vitro methylated and unmethylated DNA (SSS.1-treated and WGA). We will apply the methods developed in Aims 1-3 to DNA methylation data generated at the USC Epigenome center in studies of cancer, neurological disorder, and autoimmune disease, and make user-friendly, open-source computational tools publicly available.
In humans, epigenetic variation permits cells with identical genomes to specialize in function. This variation is an intermediate phenotype, affected by exposures, and predictive of disease and outcome. DNA methylation is the most commonly studied epigenetic mark, found to be aberrant in cancer, autoimmune and neurological disorders. We propose to develop statistical methods for the analysis of DNA methylation measured using new high-throughput microarray and sequencing technologies, for a better understanding of its role in human disease.
|Wang, Xinhui; Laird, Peter W; Hinoue, Toshinori et al. (2014) Non-specific filtering of beta-distributed data. BMC Bioinformatics 15:199|