Bioinformatics

Li, Leping

Abstract

Characterization of constitutive CTCF/cohesin loci: a possible role in establishing topological domains in mammalian genomes Recent studies suggested that human/mammalian genomes are divided into large, discrete domains that are units of chromosome organization. CTCF, a CCCTC binding factor, has a diverse role in genome regulation including transcriptional regulation, chromosome-boundary insulation, DNA replication, and chromatin packaging. It remains unclear whether a subset of CTCF binding sites plays a functional role in establishing/maintaining chromatin topological domains. We systematically analysed the genomic, transcriptomic and epigenetic profiles of the CTCF binding sites in 56 human cell lines from ENCODE. We identified 24,000 CTCF sites (referred to as constitutive sites) that were bound in more than 90% of the cell lines. Our analysis revealed: 1) constitutive CTCF loci were located in constitutive open chromatin and often co-localized with constitutive cohesin loci;2) most constitutive CTCF loci were distant from transcription start sites and lacked CpG islands but were enriched with the full-spectrum CTCF motifs: a recently reported 33/34-mer and two other potentially novel (22/26-mer);3) more importantly, most constitutive CTCF loci were present in CTCF-mediated chromatin interactions detected by ChIA-PET and these pair-wise interactions occurred predominantly within, but not between, topological domains identified by Hi-C. Our results suggest that the constitutive CTCF sites may play a role in organizing/maintaining the recently identified topological domains that are common across most human cells. Developing an annotation and visualization tool for ChIP-seq data A typical ChIP-seq experiment identifies tens of thousands of loci bound by a protein. Often, data analysis such as locus annotation is carried out by someone other than the biologist who generated the data. Moreover, there is no easy way to graphically visualize all the loci simultaneously. Although one can submit one locus at a time to the UCSC genome browser for visualization, sequential visualization is practical for a few loci but not for tens of thousands of them. This limitation hampers biologists in discovering interesting loci for hypothesis generation. We have developed a publically available tool for annotating and dynamically visualizing all loci from one or more ChIP-seq experiments. It is designed with non-bioinformaticians in mind and presents a straightforward user interface. Our server annotates each locus with respect to the known gene information available at NCBI and on the UCSC genome browser. It outputs the annotation result in any of various formats, including Excel spreadsheets, tab-separated text files, and HTML documents. The usual information such as the distance from a ChIP-seq locus to the nearest transcription start site, the symbol and description of the associated gene, etc is provided. More importantly, in the HTML output, each locus is displayed in a graphic window in the context of the respective genome. This allows the biologist to instantly tell if a locus is intronic, exonic, or in the upstream promoter region. One can also zoom in or out to a larger or smaller region for visualization, exploration and discovery. The HTML output also displays a pie chart showing the distributions of the loci in UTRs, introns, exons, and promoters. A user is also able to search for his/her favorite gene or locus and make other kinds of queries. We expect that this project will significantly advance the state of the art in web-based genomics interfaces. The visualization system is based on modern interface principles and is designed to be intuitive and easy to use, rather than depending on extensive documentation. We also hope that the MiniBrowser Python/Java script interface widget developed for this project will also be useful for other web-based bioinformatics tools. Our tool allows the biologists who generated the data to explore their data themselves, with the benefit of their own intuition, so as to enhance discovery and hypothesis generation. T-KDE: A method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. Knowing the locations of all constitutive sites for a protein of interest is prerequisite for understanding these sites functional relevance. Robust and efficient computational methods for identifying constitutive binding sites are lacking, however. We propose a method, T-KDE, to identify the locations of constitutive binding sites. T-KDE, which combines a binary range tree with a kernel density estimator, is applied to ChIP-seq data from multiple cell lines. Using a set of constitutive CTCF (CCCTC-binding factor) sites identified through motif analysis as the gold standard, we compared T-KDE with binning-based approach and demonstrated that T-KDE performs superior. Furthermore, we showed that T-KDE can identify additional constitutive sites that were missed by motif-based approach due to two possible scenarios: 1) A site may be bound in all cell lines but failed to reach the motif significance cutoff;2) A site may be missed if the peak sequence used in motif scan is not long enough. Motif analysis of the set of constitutive CTCF sites that failed to reach motif significance discovered two new CTCF motif variants. Using data from ENCODE on 22 transcription factors (TF) in 112 cell lines, we identified constitutive binding sites for each TF and provide evidence that, for some TFs, they may be biologically meaningful. Besides constitutive binding sites for a given TF, T-KDE can identify genomic hot spots where several different proteins bind and, conversely, cell-specific sites bound by a given protein. We showed that, using 116 CTCF ChIP-seq datasets as example, T-KDE is relatively robust to the choice of the free parameter and is highly accurate when compared to the identification of constitutive binding sites through motif analysis. We also have several long standing collaborations with intramural investigators. Specifically, a) Identifying differentially expressed genes in wild-type Zfp36l3 and Zfp36l3 knockout (KO) mouse placentas using Affymetrix and Agilent arrays and deep sequencing (mRNA-seq) (PI Blackshear). b) Identifying Zfp36l3 target by RNA-seq analysis (PI Blackshear). c) Role of Med13 in embryo development (PI Williams) d) Genome-wide tamoxifen induced ER alpha binding specificity (PI Korach).

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Environmental Health Sciences (NIEHS)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIAES101765-10
Application #: 8734141
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 10
Fiscal Year: 2013
Total Cost: $1,097,764
Indirect Cost

Institution

Name: National Institute of Environmental Health Sciences
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects

Publications

Miao, Yi-Liang; Gambini, Andrés; Zhang, Yingpei et al. (2018) Mediator complex component MED13 regulates zygotic genome activation and is required for postimplantation development in the mouse. Biol Reprod 98:449-464

Nguyen, Thuy-Ai T; Grimm, Sara A; Bushel, Pierre R et al. (2018) Revealing a human p53 universe. Nucleic Acids Res :

Ungewitter, Erica K; Rotgers, Emmi; Kang, Hong Soon et al. (2018) Loss of Glis3 causes dysregulation of retrotransposon silencing and germ cell demise in fetal mouse testis. Sci Rep 8:9662

Roy, Sumedha; Moore, Amanda J; Love, Cassandra et al. (2018) Id Proteins Suppress E2A-Driven Invariant Natural Killer T Cell Development prior to TCR Selection. Front Immunol 9:42

Li, Yuanyuan; Umbach, David M; Li, Leping (2017) Putative genomic characteristics of BRAF V600K versus V600E cutaneous melanoma. Melanoma Res 27:527-535

Fan, Zheng; Ahn, Mihye; Roth, Heidi L et al. (2017) Sleep Apnea and Hypoventilation in Patients with Down Syndrome: Analysis of 144 Polysomnogram Studies. Children (Basel) 4:

Ren, Natalie S X; Ji, Ming; Tokar, Erik J et al. (2017) Haploinsufficiency of SIRT1 Enhances Glutamine Metabolism and Promotes Cancer Development. Curr Biol 27:483-494

Li, Yuanyuan; Kang, Kai; Krahn, Juno M et al. (2017) A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data. BMC Genomics 18:508

Lowe, Julie M; Nguyen, Thuy-Ai; Grimm, Sara A et al. (2017) The novel p53 target TNFAIP8 variant 2 is increased in cancer and offsets p53-dependent tumor suppression. Cell Death Differ 24:181-191

Xu, Zongli; Niu, Liang; Li, Leping et al. (2016) ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Res 44:e20

Showing the most recent 10 out of 36 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: