We will build on our success in identifying important functional domains in unaligned biological sequences by extending the methodology to work on a wider variety of problems. Specifically, our greedy algorithm approach, using an information content scoring method, for identifying regulatory sites in DNA will be extended to work on patterns including gaps in the alignments. This method should work well on both protein and nucleic acid sequences and can be thought of as a means for discovering """"""""profiles"""""""" in unaligned sequences that are known to be functionally related. Both global and local alignment programs will be developed and a number of refinement procedures will be explored to see which provide the best likelihoods of obtaining the global optima using the greedy algorithm. We will also examine other types of information which can be used to identify common patterns in functionally related sequences. For example, mutual information can be used in the ranking of RNA secondary structures and better ways of utilizing amino acid similarities should also be possible. We will specifically examine the database of eukaryotic promoters, both to get better representations of known binding sites and hopefully to also discover unknown relationships between promoters. We will also enhance the user interface to the programs we develop to provide graphical displays of the identified sites. This will facilitate the identification of spatial relationships between sites, both of the same type and of different types. We will also continue work on alternative pattern recognition methods, such as Expectation-Maximization and Neural Networks. This will be done as both informal collaborations and in-house. We will also continue collaborations on biochemical applications of these methods, both with other laboratories and with other members of the Stormo lab.
Showing the most recent 10 out of 109 publications