The primary objective of this grant is to develop and evaluate methods for the statistical analysis of DNA methylation data, with the ultimate goal of understanding the joint behavior of DNA methylation with genotype, copy number variation, and gene expression. A wide variety of technologies are available for studying DNA methylation (see Laird 2010 for a review). We focus on statistical method development for two different platforms provided by Illumina, Inc. All our aims are motivated by ongoing studies at the University of Southern California Epigenome Center. Specifically, we propose the following:
Specific Aim 1 : To develop and evaluate preprocessing methods for Illumina's Infinium HumanMethylation BeadArrays using technical replicates and mixed samples. a. To develop a fast Gamma-Gamma convolution model to correct for background fluorescence, and compare it with state-of-the art methods;b. To extend background correction methods to stratify by GC content;c. To provide code for data preprocessing in Bioconductor.
Specific Aim 2 : To develop and evaluate statistical tools for exploring condition-specific variation in DNA methylation. a. To develop novel filters to select loci for cluster analysis that consider the outcome, proportion DNA methylation, to follow a Beta distribution with variance a function of the mean;b. To develop a method for differential methylation detection using spatial smoothing and the fused lasso.
Specific Aim 3 : To develop and evaluate methods for processing whole-genome bisulfite-seq data. a. To develop and evaluate a novel model-based SNP genotype caller for bisulfite sequence data. This tool will simultaneously extract DNA methylation content for downstream analysis;b. To calibrate our model for known biases in bisulfite conversion and sequencing errors using control data sets of in vitro methylated and unmethylated DNA (SSS.1-treated and WGA). We will apply the methods developed in Aims 1-3 to DNA methylation data generated at the USC Epigenome center in studies of cancer, neurological disorder, and autoimmune disease, and make user-friendly, open-source computational tools publicly available.

Public Health Relevance

In humans, epigenetic variation permits cells with identical genomes to specialize in function. This variation is an intermediate phenotype, affected by exposures, and predictive of disease and outcome. DNA methylation is the most commonly studied epigenetic mark, found to be aberrant in cancer, autoimmune and neurological disorders. We propose to develop statistical methods for the analysis of DNA methylation measured using new high-throughput microarray and sequencing technologies, for a better understanding of its role in human disease.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG006705-01A1
Application #
8440116
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
2013-03-26
Project End
2016-02-29
Budget Start
2013-03-26
Budget End
2014-02-28
Support Year
1
Fiscal Year
2013
Total Cost
$365,544
Indirect Cost
$142,596
Name
University of Southern California
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90089
Wang, Xinhui; Laird, Peter W; Hinoue, Toshinori et al. (2014) Non-specific filtering of beta-distributed data. BMC Bioinformatics 15:199