Systematic Identification of Core Regulatory Circuitry from ENCODE Data

Beer, Michael

Abstract

While much progress has been made generating high quality chromatin state and accessibility data from the ENCODE and Roadmap consortia, accurately identifying cell-type specific enhancers from these data remains a significant challenge. We have recently developed a computational approach (gkmSVM) to predict regulatory elements from DNA sequence, and we have shown that when gkmSVM is trained on DHS data from each of the human and mouse ENCODE and Roadmap cells and tissues, it can predict both cell specific enhancer activity and the impact of regulatory variants (deltaSVM) with greater precision than alternative approaches. The gkmSVM model encapsulates a set of cell-type specific weights describing the regulatory binding site vocabulary controlling chromatin accessibility in each cell type. A striking observation is that the significant gkmSVM weights are generally identifiable with a small (~20) set of TF binding sites which vary by cell-type, consistent with the hypothesis that cell-type specific expression programs are controlled by a small set of core factors tightly coupled in mutually interacting regulatory circuits. Perturbations of these core regulators enable transitions between stable differentiated cell-type states of this genetic circuit. Here, we will use gkmSVM to systematically identify the core regulatory circuitry in all existing ENCODE and Roadmap human and mouse cell lines and tissues, and produce DNA sequence based genomic regulatory maps and fine-scale predictions of core regulator binding sites within predicted regulatory regions. We will generate binding site models for core regulators in each cell type, assess the accuracy of our predictions through direct experimental validation. The value of this map critically depends on its accuracy, so we demonstrate that gkmSVM predictions consistently outperform alternative methods in massively parallel enhancer reporter and luciferase validation assays, in blind community assessments of regulatory element predictions (CAGI), and in predicting validated causal disease associated variants. In contrast, we show that methods using PWM descriptions of TF binding sites are significantly less accurate. We will produce base-pair resolution predictions of the cell specific TF binding sites (TFBS) within broader regulatory regions detected by multiple ENCODE epigenomic Mapping datasets, and to test these TFBS predictions in collaboration with Functional Characterization Centers (FCC). Our regulatory maps will help design and inform focused experiments probing regulatory mechanisms, and aid in the interpretation of disease associated non-coding variants.

Public Health Relevance

We propose to develop computational tools to systematically identify the core TF regulators using DNA sequence based machine learning (ML) in all existing ENCODE and Roadmap human and mouse cell lines and tissues, and to produce base-pair resolution predictions of the cell specific TF binding sites (TFBS) within broader regulatory regions detected by multiple ENCODE epigenomic Mapping datasets, and to test these TFBS predictions in collaboration with Functional Characterization Centers (FCC).

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 3U01HG009380-04S1
Application #: 10238262
Study Section
Program Officer: Gilchrist, Daniel A

Project Start: 2017-02-01
Project End: 2022-01-31
Budget Start: 2021-02-01
Budget End: 2022-01-31
Support Year: 4
Fiscal Year: 2021
Total Cost
Indirect Cost

Institution

Name: Johns Hopkins University
Department: Biomedical Engineering
Type: Schools of Medicine
DUNS #: 001910777

City: Baltimore
State: MD
Country: United States
Zip Code: 21218

Related projects


NIH 2021 U01 HG	Systematic Identification of Core Regulatory Circuitry from ENCODE Data Beer, Michael A. / Johns Hopkins University
NIH 2020 U01 HG	'Systematic Identification of Core Regulatory Circuitry from ENCODE Data' Beer, Michael A. / Johns Hopkins University
NIH 2019 U01 HG	'Systematic Identification of Core Regulatory Circuitry from ENCODE Data' Beer, Michael A. / Johns Hopkins University
NIH 2018 U01 HG	'Systematic Identification of Core Regulatory Circuitry from ENCODE Data' Beer, Michael A. / Johns Hopkins University
NIH 2017 U01 HG	'Systematic Identification of Core Regulatory Circuitry from ENCODE Data' Beer, Michael A. / Johns Hopkins University	$406,169

Publications

Gate, Rachel E; Cheng, Christine S; Aiden, Aviva P et al. (2018) Genetic determinants of co-accessible chromatin regions in activated T cells across humans. Nat Genet 50:1140-1150

Beer, Michael A (2017) Predicting enhancer activity and variant impact using gkm-SVM. Hum Mutat 38:1251-1258

Kreimer, Anat; Zeng, Haoyang; Edwards, Matthew D et al. (2017) Predicting gene expression in massively parallel reporter assays: A comparative study. Hum Mutat 38:1240-1250

Migeon, Barbara R; Beer, Michael A; Bjornsson, Hans T (2017) Embryonic loss of human females with partial trisomy 19 identifies region critical for the single active X. PLoS One 12:e0170403

Comments

Be the first to comment on Michael Beer's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: