DNA methylation, an epigenetic modification affecting the organization and function of the genome, plays a critical role in both normal development and disease. Bisulfite based conversion of unmethylated Cs to Ts followed by deep sequencing (BS-seq) has emerged as the gold standard to study the genome-wide DNA methylation at single-nucleotide resolution. While progress in next-generation sequencing (NGS) allows increasingly affordable whole-genome BS-seq (WGBS), interpretation of the resulting massive amount of data requires efficient bioinformatics methods. In this proposal, we will develop a series of novel bioinformatics methods for BS-seq data analysis. First, building on the early success of our BSMAP program, we will develop the next generation of bisulfite aligner. We will construct a bisulfite- and SNP-aware genome indexing for read mapping with IUPAC code and dynamic Burrows-Wheeler transformation (DBWT). We will also distinguishing CpG methylation from C/T SNP and use GPU hardware acceleration to improve the mapping speed. Second, we will develop a powerful differential methylation analysis algorithm that can take into account both sampling variation from sequencing and biological variation between replicates. We will also introduce a novel metric for evaluating both the statistical and biological significance of differential methylation. This model will have enough power to detect single-CpG resolution differential methylation in low-CpG-density regulatory regions, such as enhancers, with as low as 5-10 fold sequencing depth. Third, we will develop a comprehensive BS-seq data analysis pipeline using the Galaxy web interface and cloud computing. We will integrate all the BS-seq tools we are developing and other public algorithms on a continuous basis according to the emerging needs of the epigenetic community. This pipeline will empower experimental biologists to perform most analyses on their own. These bioinformatics methods will undergo extensive testing and experimental validation by our collaborators. Although focused on CpG methylation using conventional BS-seq in this proposal, our bioinformatics methods can be immediately used in other modified BS-seq protocols, such as oxBS-Seq and TAB-Seq recently developed for 5mC and 5hmC, respectively. Finally, as a case study, we will apply these new methods to unravel the in vivo role of DNA methylation in hematopoietic malignancies. These experiments and follow-up validations will also enable us to improve the efficacy of our bioinformatics methods.
Bisulfite sequencing is a powerful technology to study how gene's activities are controlled in normal cells and human diseases. We propose to develop novel computational methods urgently needed by scientists to analyze large-scale bisulfite sequencing data sets. This will provide valuable insights into the personalized diagnosis and treatment of many human diseases.
Showing the most recent 10 out of 60 publications