DNA methylation, an epigenetic modification affecting the organization and function of the genome, plays a critical role in both normal development and disease. Bisulfite based conversion of unmethylated Cs to Ts followed by deep sequencing (BS-seq) has emerged as the gold standard to study the genome-wide DNA methylation at single-nucleotide resolution. While progress in next-generation sequencing (NGS) allows increasingly affordable whole-genome BS-seq (WGBS), interpretation of the resulting massive amount of data requires efficient bioinformatics methods. In this proposal, we will develop a series of novel bioinformatics methods for BS-seq data analysis. First, building on the early success of our BSMAP program, we will develop the next generation of bisulfite aligner. We will construct a bisulfite- and SNP-?aware? genome indexing for read mapping with IUPAC code and dynamic Burrows-Wheeler transformation (DBWT). We will also distinguishing CpG methylation from C/T SNP and use GPU hardware acceleration to improve the mapping speed. Second, we will develop a powerful differential methylation analysis algorithm that can take into account both sampling variation from sequencing and biological variation between replicates. We will also introduce a novel metric for evaluating both the statistical and biological significance of differential methylation. This model will have enough power to detect single-CpG resolution differential methylation in low-CpG-density regulatory regions, such as enhancers, with as low as 5-10 fold sequencing depth. Third, We will develop a comprehensive BS-seq data analysis pipeline using the Galaxy web interface and cloud computing. We will integrate all the BS-seq tools we are developing and other public algorithms on a continuous basis according to the emerging needs of the epigenetic community. This pipeline will empower experimental biologists to perform most analyses on their own. These bioinformatics methods will undergo extensive testing and experimental validation by our collaborators. Although focused on CpG methylation using conventional BS-seq in this proposal, our bioinformatics methods can be immediately used in other modified BS-seq protocols, such as oxBS-Seq and TAB-Seq recently developed for 5mC and 5hmC, respectively. Finally, as a case study, we will apply these new methods to unravel the in vivo role of DNA methylation in hematopoietic malignancies. These experiments and follow-up validations will also enable us to improve the efficacy of our bioinformatics methods.

Public Health Relevance

Bisulfite sequencing is a powerful technology to study how gene's activities are controlled in normal cells and human diseases. We propose to develop novel computational methods urgently needed by scientists to analyze large-scale bisulfite sequencing data sets. This will provide valuable insights into the personalized diagnosis and treatment of many human diseases. 1

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Irvine
Schools of Medicine
United States
Zip Code
Park, Hyun Jung; Ji, Ping; Kim, Soyeon et al. (2018) 3' UTR shortening represses tumor-suppressor genes in trans by disrupting ceRNA crosstalk. Nat Genet 50:783-789
Mi, Wenyi; Zhang, Yi; Lyu, Jie et al. (2018) The ZZ-type zinc finger of ZZZ3 modulates the ATAC complex-mediated histone acetylation and gene activation. Nat Commun 9:3759
Su, Jianzhong; Huang, Yung-Hsin; Cui, Xiaodong et al. (2018) Homeobox oncogene activation by pan-cancer DNA hypermethylation. Genome Biol 19:108
Park, Hyun Jung; Kim, Soyeon; Li, Wei (2018) Model-based analysis of competing-endogenous pathways (MACPath) in human cancers. PLoS Comput Biol 14:e1006074
Zhang, Yilei; Shi, Jiejun; Liu, Xiaoguang et al. (2018) BAP1 links metabolic regulation of ferroptosis to tumour suppression. Nat Cell Biol 20:1181-1192
Hsu, Chih-Chao; Zhao, Dan; Shi, Jiejun et al. (2018) Gas41 links histone acetylation to H2A.Z deposition and maintenance of embryonic stem cell identity. Cell Discov 4:28
Hsu, Chih-Chao; Shi, Jiejun; Yuan, Chao et al. (2018) Recognition of histone acetylation by the GAS41 YEATS domain promotes H2A.Z deposition in non-small cell lung cancer. Genes Dev 32:58-69
Jeong, Mira; Park, Hyun Jung; Celik, Hamza et al. (2018) Loss of Dnmt3a Immortalizes Hematopoietic Stem Cells In Vivo. Cell Rep 23:1-10
Feng, Xin; Li, Lei; Wagner, Eric J et al. (2018) TC3A: The Cancer 3' UTR Atlas. Nucleic Acids Res 46:D1027-D1030
Mi, Wenyi; Guan, Haipeng; Lyu, Jie et al. (2017) YEATS2 links histone acetylation to tumorigenesis of non-small cell lung cancer. Nat Commun 8:1088

Showing the most recent 10 out of 60 publications