DNA methylation, an epigenetic modification affecting the organization and function of the genome, plays a critical role in both normal development and disease. Bisulfite based conversion of unmethylated Cs to Ts followed by deep sequencing (BS-seq) has emerged as the gold standard to study the genome-wide DNA methylation at single-nucleotide resolution. While progress in next-generation sequencing (NGS) allows increasingly affordable whole-genome BS-seq (WGBS), interpretation of the resulting massive amount of data requires efficient bioinformatics methods. In this proposal, we will develop a series of novel bioinformatics methods for BS-seq data analysis. First, building on the early success of our BSMAP program, we will develop the next generation of bisulfite aligner. We will construct a bisulfite- and SNP-"aware" genome indexing for read mapping with IUPAC code and dynamic Burrows-Wheeler transformation (DBWT). We will also distinguishing CpG methylation from C/T SNP and use GPU hardware acceleration to improve the mapping speed. Second, we will develop a powerful differential methylation analysis algorithm that can take into account both sampling variation from sequencing and biological variation between replicates. We will also introduce a novel metric for evaluating both the statistical and biological significance of differential methylation. This model will have enough power to detect single-CpG resolution differential methylation in low-CpG-density regulatory regions, such as enhancers, with as low as 5-10 fold sequencing depth. Third, we will develop a comprehensive BS-seq data analysis pipeline using the Galaxy web interface and cloud computing. We will integrate all the BS-seq tools we are developing and other public algorithms on a continuous basis according to the emerging needs of the epigenetic community. This pipeline will empower experimental biologists to perform most analyses on their own. These bioinformatics methods will undergo extensive testing and experimental validation by our collaborators. Although focused on CpG methylation using conventional BS-seq in this proposal, our bioinformatics methods can be immediately used in other modified BS-seq protocols, such as oxBS-Seq and TAB-Seq recently developed for 5mC and 5hmC, respectively. Finally, as a case study, we will apply these new methods to unravel the in vivo role of DNA methylation in hematopoietic malignancies. These experiments and follow-up validations will also enable us to improve the efficacy of our bioinformatics methods.

Public Health Relevance

Bisulfite sequencing is a powerful technology to study how gene's activities are controlled in normal cells and human diseases. We propose to develop novel computational methods urgently needed by scientists to analyze large-scale bisulfite sequencing data sets. This will provide valuable insights into the personalized diagnosis and treatment of many human diseases.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Baylor College of Medicine
Anatomy/Cell Biology
Schools of Medicine
United States
Zip Code
Sowalsky, Adam G; Xia, Zheng; Wang, Liguo et al. (2015) Whole transcriptome sequencing reveals extensive unspliced mRNA in metastatic castration-resistant prostate cancer. Mol Cancer Res 13:98-106
Klein, Brianna J; Piao, Lianhua; Xi, Yuanxin et al. (2014) The histone-H3K4-specific demethylase KDM5B binds to its substrate and product through distinct PHD fingers. Cell Rep 6:325-35
Masamha, Chioniso P; Xia, Zheng; Yang, Jingxuan et al. (2014) CFIm25 links alternative polyadenylation to glioblastoma tumour suppression. Nature 510:412-6
Klein, Brianna J; Lalonde, Marie-Eve; Côté, Jacques et al. (2014) Crosstalk between epigenetic readers regulates the MOZ/MORF HAT complexes. Epigenetics 9:186-93
Sun, Deqiang; Luo, Min; Jeong, Mira et al. (2014) Epigenomic profiling of young and aged HSCs reveals concerted changes during aging that reinforce self-renewal. Cell Stem Cell 14:673-88
Pathiraja, Thushangi N; Nayak, Shweta R; Xi, Yuanxin et al. (2014) Epigenetic reprogramming of HOXC10 in endocrine-resistant breast cancer. Sci Transl Med 6:229ra41
Wang, Liguo; Chen, Junsheng; Wang, Chen et al. (2014) MACE: model based analysis of ChIP-exo. Nucleic Acids Res 42:e156
Jeong, Mira; Goodell, Margaret A (2014) New answers to old questions from genome-wide maps of DNA methylation in hematopoietic cells. Exp Hematol 42:609-17
Wen, Hong; Li, Yuanyuan; Xi, Yuanxin et al. (2014) ZMYND11 links histone H3.3K36me3 to transcription elongation and tumour suppression. Nature 508:263-8
Singh, Ravi K; Xia, Zheng; Bland, Christopher S et al. (2014) Rbfox2-coordinated alternative splicing of Mef2d and Rock2 controls myoblast fusion during myogenesis. Mol Cell 55:592-603

Showing the most recent 10 out of 16 publications