We will build on our success in identifying important functional domains in unaligned biological sequences by extending the methodology to work on a wider variety of problems. Specifically, our greedy algorithm approach, using an information content scoring method, for identifying regulatory sites in DNA will be extended to work on patterns including gaps in the alignments. This method should work well on both protein and nucleic acid sequences and can be thought of as a means for discovering """"""""profiles"""""""" in unaligned sequences that are known to be functionally related. Both global and local alignment programs will be developed and a number of refinement procedures will be explored to see which provide the best likelihoods of obtaining the global optima using the greedy algorithm. We will also examine other types of information which can be used to identify common patterns in functionally related sequences. For example, mutual information can be used in the ranking of RNA secondary structures and better ways of utilizing amino acid similarities should also be possible. We will specifically examine the database of eukaryotic promoters, both to get better representations of known binding sites and hopefully to also discover unknown relationships between promoters. We will also enhance the user interface to the programs we develop to provide graphical displays of the identified sites. This will facilitate the identification of spatial relationships between sites, both of the same type and of different types. We will also continue work on alternative pattern recognition methods, such as Expectation-Maximization and Neural Networks. This will be done as both informal collaborations and in-house. We will also continue collaborations on biochemical applications of these methods, both with other laboratories and with other members of the Stormo lab.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG000249-07
Application #
2208666
Study Section
Genome Study Section (GNM)
Project Start
1989-04-01
Project End
1997-03-31
Budget Start
1995-04-01
Budget End
1996-03-31
Support Year
7
Fiscal Year
1995
Total Cost
Indirect Cost
Name
University of Colorado at Boulder
Department
Biochemistry
Type
Schools of Arts and Sciences
DUNS #
City
Boulder
State
CO
Country
United States
Zip Code
80309
Ruan, Shuxiang; Stormo, Gary D (2018) Comparison of discriminative motif optimization using matrix and DNA shape-based models. BMC Bioinformatics 19:86
Chang, Yiming K; Zuo, Zheng; Stormo, Gary D (2018) Quantitative profiling of BATF family proteins/JUNB/IRF hetero-trimers using Spec-seq. BMC Mol Biol 19:5
Ruan, Shuxiang; Swamidass, S Joshua; Stormo, Gary D (2017) BEESEM: estimation of binding energy models using HT-SELEX data. Bioinformatics 33:2288-2295
Chang, Yiming K; Srivastava, Yogesh; Hu, Caizhen et al. (2017) Quantitative profiling of selective Sox/POU pairing on hundreds of sequences in parallel by Coop-seq. Nucleic Acids Res 45:832-845
Hu, Caizhen; Malik, Vikas; Chang, Yiming Kenny et al. (2017) Coop-Seq Analysis Demonstrates that Sox2 Evokes Latent Specificities in the DNA Recognition by Pax6. J Mol Biol 429:3626-3634
Roy, Basab; Zuo, Zheng; Stormo, Gary D (2017) Quantitative specificity of STAT1 and several variants. Nucleic Acids Res 45:8199-8207
Xiao, Shu; Lu, Jia; Sridhar, Bharat et al. (2017) SMARCAD1 Contributes to the Regulation of Naive Pluripotency by Interacting with Histone Citrullination. Cell Rep 18:3117-3128
Zuo, Zheng; Roy, Basab; Chang, Yiming Kenny et al. (2017) Measuring quantitative effects of methylation on transcription factor-DNA binding affinity. Sci Adv 3:eaao1799
Ruan, Shuxiang; Stormo, Gary D (2017) Inherent limitations of probabilistic models for protein-DNA binding specificity. PLoS Comput Biol 13:e1005638
Stormo, Gary D; Roy, Basab (2016) DNA Structure Helps Predict Protein Binding. Cell Syst 3:216-218

Showing the most recent 10 out of 109 publications