DNA methylation, an epigenetic modification affecting the organization and function of the genome, plays a critical role in both normal development and disease. Bisulfite based conversion of unmethylated Cs to Ts followed by deep sequencing (BS-seq) has emerged as the gold standard to study the genome-wide DNA methylation at single-nucleotide resolution. While progress in next-generation sequencing (NGS) allows increasingly affordable whole-genome BS-seq (WGBS), interpretation of the resulting massive amount of data requires efficient bioinformatics methods. In this proposal, we will develop a series of novel bioinformatics methods for BS-seq data analysis. First, building on the early success of our BSMAP program, we will develop the next generation of bisulfite aligner. We will construct a bisulfite- and SNP-aware genome indexing for read mapping with IUPAC code and dynamic Burrows-Wheeler transformation (DBWT). We will also distinguishing CpG methylation from C/T SNP and use GPU hardware acceleration to improve the mapping speed. Second, we will develop a powerful differential methylation analysis algorithm that can take into account both sampling variation from sequencing and biological variation between replicates. We will also introduce a novel metric for evaluating both the statistical and biological significance of differential methylation. This model will have enough power to detect single-CpG resolution differential methylation in low-CpG-density regulatory regions, such as enhancers, with as low as 5-10 fold sequencing depth. Third, we will develop a comprehensive BS-seq data analysis pipeline using the Galaxy web interface and cloud computing. We will integrate all the BS-seq tools we are developing and other public algorithms on a continuous basis according to the emerging needs of the epigenetic community. This pipeline will empower experimental biologists to perform most analyses on their own. These bioinformatics methods will undergo extensive testing and experimental validation by our collaborators. Although focused on CpG methylation using conventional BS-seq in this proposal, our bioinformatics methods can be immediately used in other modified BS-seq protocols, such as oxBS-Seq and TAB-Seq recently developed for 5mC and 5hmC, respectively. Finally, as a case study, we will apply these new methods to unravel the in vivo role of DNA methylation in hematopoietic malignancies. These experiments and follow-up validations will also enable us to improve the efficacy of our bioinformatics methods.

Public Health Relevance

Bisulfite sequencing is a powerful technology to study how gene's activities are controlled in normal cells and human diseases. We propose to develop novel computational methods urgently needed by scientists to analyze large-scale bisulfite sequencing data sets. This will provide valuable insights into the personalized diagnosis and treatment of many human diseases.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG007538-04
Application #
9188819
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pazin, Michael J
Project Start
2013-12-19
Project End
2018-11-30
Budget Start
2016-12-01
Budget End
2017-11-30
Support Year
4
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Baylor College of Medicine
Department
Anatomy/Cell Biology
Type
Schools of Medicine
DUNS #
051113330
City
Houston
State
TX
Country
United States
Zip Code
77030
Jeong, Mira; Park, Hyun Jung; Celik, Hamza et al. (2018) Loss of Dnmt3a Immortalizes Hematopoietic Stem Cells In Vivo. Cell Rep 23:1-10
Feng, Xin; Li, Lei; Wagner, Eric J et al. (2018) TC3A: The Cancer 3' UTR Atlas. Nucleic Acids Res 46:D1027-D1030
Park, Hyun Jung; Ji, Ping; Kim, Soyeon et al. (2018) 3' UTR shortening represses tumor-suppressor genes in trans by disrupting ceRNA crosstalk. Nat Genet 50:783-789
Mi, Wenyi; Zhang, Yi; Lyu, Jie et al. (2018) The ZZ-type zinc finger of ZZZ3 modulates the ATAC complex-mediated histone acetylation and gene activation. Nat Commun 9:3759
Su, Jianzhong; Huang, Yung-Hsin; Cui, Xiaodong et al. (2018) Homeobox oncogene activation by pan-cancer DNA hypermethylation. Genome Biol 19:108
Park, Hyun Jung; Kim, Soyeon; Li, Wei (2018) Model-based analysis of competing-endogenous pathways (MACPath) in human cancers. PLoS Comput Biol 14:e1006074
Zhang, Yilei; Shi, Jiejun; Liu, Xiaoguang et al. (2018) BAP1 links metabolic regulation of ferroptosis to tumour suppression. Nat Cell Biol 20:1181-1192
Hsu, Chih-Chao; Zhao, Dan; Shi, Jiejun et al. (2018) Gas41 links histone acetylation to H2A.Z deposition and maintenance of embryonic stem cell identity. Cell Discov 4:28
Hsu, Chih-Chao; Shi, Jiejun; Yuan, Chao et al. (2018) Recognition of histone acetylation by the GAS41 YEATS domain promotes H2A.Z deposition in non-small cell lung cancer. Genes Dev 32:58-69
Mi, Wenyi; Guan, Haipeng; Lyu, Jie et al. (2017) YEATS2 links histone acetylation to tumorigenesis of non-small cell lung cancer. Nat Commun 8:1088

Showing the most recent 10 out of 60 publications