New methods for quantitative modeling of protein-DNA interactions

Gordan, Raluca

Abstract

Accurate predictions of transcription factor (TF)-DNA interactions across the human genome are critical for deciphering transcriptional regulatory networks in healthy and diseased cells, as well as for understanding the phenotypic effects of polymorphisms in non-coding genomic regions. However, the most widely used model of TF-DNA binding affinity, the position weight matrix (PWM), is known to provide only an approximation of the true sequence specificity of TFs, because it assumes independence among the base pairs in TF binding sites. More complex binding models have been proposed, but their improvement over PWMs was marginal, either because of limitations of the training data (i.e. due to strong biases, noise, artifacts, or confounding factors) or because the models were not flexible enough to capture complex dependencies in TF binding sites. As a result, current DNA binding models have a limited ability to predict the effects of non-coding genetic variation on TF binding, and they cannot be used to resolve functional differences between closely related TFs with similar DNA binding domains but distinct regulatory roles in the cell. The objective of this application is to overcome these limitations by generating high quality data that will be used to train flexible statistical models to generate TF-DNA binding affinity predictions with accuracies similar to experimental in vitro assays. The central hypothesis, based on preliminary results and previous work, is that both better affinity data and better statistical models are needed in order to predict TF-DNA interactions in human cells with significantly higher accuracy than current models. High quality binding affinity data for 40 human TFs will be generated in Aim 1 using a unique combination of in vitro assays carefully designed to minimizes bias and noise, thus making the data ideal for training complex models. Novel TF-DNA binding models will be developed in Aim 2 using state- of-the-art statistical methods: support vector regression, nonparametric Bayes modeling, and conditional tensor factorization. The models will be tested experimentally in vitro, and by leveraging in vivo data from the ENCODE project.
In Aim 3, the new binding models will be used in two applications: 1) to predict the quantitative effects of non-coding single nucleotide polymorphisms on TF binding affinities and TF binding levels, and 2) to predict differential in vivo DNA binding of closely related TFs with similar DNA binding domains but distinct regulatory functions in the cell. Such applications are not possible using current models. Overall, we anticipate that the binding affinity models developed in this project will allow for much more accurate predictions of regulatory TF-DNA interactions than possible using current models, which is significant because it will lead to a better understanding of gene regulatory programs and their misregulation during disease, including understanding the cascade of events that link genetic variation to human disease.

Public Health Relevance

The computational models and the experimental approaches developed in this project will lead to a better understanding of how transcription factors recognize their specific DNA sites across the genome, and how these interactions are disrupted by mutations or polymorphisms in the binding sites. Given that many genetic variations associated with complex human diseases are located in non-coding regions of the genome, our models can be used to accurately predict the effect of such variations on transcription factor-DNA binding, and to prioritize them for further investigation into the genetic causes of human diseases.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM117106-04
Application #: 9546780
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Ravichandran, Veerasamy

Project Start: 2015-09-25
Project End: 2020-08-31
Budget Start: 2018-09-01
Budget End: 2019-08-31
Support Year: 4
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: Duke University
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 044387793

City: Durham
State: NC
Country: United States
Zip Code: 27705

Related projects


NIH 2019 R01 GM	New methods for quantitative modeling of protein-DNA interactions Gordan, Raluca / Duke University
NIH 2018 R01 GM	New methods for quantitative modeling of protein-DNA interactions Gordan, Raluca / Duke University
NIH 2017 R01 GM	New methods for quantitative modeling of protein-DNA interactions Gordan, Raluca / Duke University
NIH 2016 R01 GM	New methods for quantitative modeling of protein-DNA interactions Gordan, Raluca / Duke University
NIH 2015 R01 GM	New methods for quantitative modeling of protein-DNA interactions Gordan, Raluca / Duke University	$363,583

Publications

Shen, Ning; Zhao, Jingkang; Schipper, Joshua L et al. (2018) Divergence in DNA Specificity among Paralogous Transcription Factors Contributes to Their Differential In Vivo Binding. Cell Syst 6:470-483.e8

Afek, A; Tagliafierro, L; Glenn, O C et al. (2018) Toward deciphering the mechanistic role of variations in the Rep1 repeat site in the transcription regulation of SNCA gene. Neurogenetics 19:135-144

Shats, Igor; Deng, Michael; Davidovich, Adam et al. (2017) Expression level is a key determinant of E2F1-mediated cell fate. Cell Death Differ 24:626-637

Zhao, Jingkang; Li, Dongshunyi; Seo, Jungkyun et al. (2017) Quantifying the Impact of Non-coding Variants on Transcription Factor-DNA Binding. Res Comput Mol Biol 10229:336-352

Frank, Christopher L; Manandhar, Dinesh; Gordân, Raluca et al. (2016) HDAC inhibitors cause site-specific chromatin remodeling at PU.1-bound enhancers in K562 cells. Epigenetics Chromatin 9:15

Comments

Be the first to comment on Raluca Gordan's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: