Gene expression is largely controlled by regulating transcription of specific segments of the genome. Cataloging and understanding the protein-DNA interactions that control gene expression is essential to understanding the normal network of interactions and how they are perturbed in disease states. New technologies have greatly increased the DNA sequence that is available and have generated many new types of data related to the protein-DNA interactions. This gives us the opportunity to develop computational models that are much more comprehensive than previously available, but increasing the accuracy of the models is essential to maximizing the biological information obtained from the data. The objectives of this proposal are to develop improved algorithms for modeling protein-DNA specificity which have greatly reduced false positive and false negative rates compared to current methods. These methods will take various types of data as input, including qualitative and quantitative binding site data, and develop models of appropriate complexity for each important factor. We will also develop improved software for the discovery of regulatory sites using data that is currently being generated in high-throughput experiments. Finally we will develop improved methods for determining which transcription factors encoded by the genome interact with which motifs that are identified by motif discovery methods. Project Narrative: Many diseases are associated with mis-regulation of gene expression. Our studies will help to understand the normal mechanisms of gene regulation and also to pinpoint specific points of error in those disease states.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG000249-21
Application #
7822910
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Good, Peter J
Project Start
1989-04-01
Project End
2012-04-30
Budget Start
2010-05-01
Budget End
2011-04-30
Support Year
21
Fiscal Year
2010
Total Cost
$357,100
Indirect Cost
Name
Washington University
Department
Genetics
Type
Schools of Medicine
DUNS #
068552207
City
Saint Louis
State
MO
Country
United States
Zip Code
63130
Ruan, Shuxiang; Stormo, Gary D (2018) Comparison of discriminative motif optimization using matrix and DNA shape-based models. BMC Bioinformatics 19:86
Chang, Yiming K; Zuo, Zheng; Stormo, Gary D (2018) Quantitative profiling of BATF family proteins/JUNB/IRF hetero-trimers using Spec-seq. BMC Mol Biol 19:5
Hu, Caizhen; Malik, Vikas; Chang, Yiming Kenny et al. (2017) Coop-Seq Analysis Demonstrates that Sox2 Evokes Latent Specificities in the DNA Recognition by Pax6. J Mol Biol 429:3626-3634
Roy, Basab; Zuo, Zheng; Stormo, Gary D (2017) Quantitative specificity of STAT1 and several variants. Nucleic Acids Res 45:8199-8207
Xiao, Shu; Lu, Jia; Sridhar, Bharat et al. (2017) SMARCAD1 Contributes to the Regulation of Naive Pluripotency by Interacting with Histone Citrullination. Cell Rep 18:3117-3128
Zuo, Zheng; Roy, Basab; Chang, Yiming Kenny et al. (2017) Measuring quantitative effects of methylation on transcription factor-DNA binding affinity. Sci Adv 3:eaao1799
Ruan, Shuxiang; Stormo, Gary D (2017) Inherent limitations of probabilistic models for protein-DNA binding specificity. PLoS Comput Biol 13:e1005638
Ruan, Shuxiang; Swamidass, S Joshua; Stormo, Gary D (2017) BEESEM: estimation of binding energy models using HT-SELEX data. Bioinformatics 33:2288-2295
Chang, Yiming K; Srivastava, Yogesh; Hu, Caizhen et al. (2017) Quantitative profiling of selective Sox/POU pairing on hundreds of sequences in parallel by Coop-seq. Nucleic Acids Res 45:832-845
Stormo, Gary D; Roy, Basab (2016) DNA Structure Helps Predict Protein Binding. Cell Syst 3:216-218

Showing the most recent 10 out of 109 publications