Several new Gibbs sampling strategies for detecting motifs have been developed: (i) a """"""""motif sampling strategy"""""""" that optimally partitions a set of input sequences into regions corresponding to different motif models; (ii) a """"""""sequence sampling strategy"""""""" that partitions a set of input sequences into two subsets, those sequences containing a specified motif or group of motifs and those that do not; and (iii) A column sampling strategy which can be used in conjunction with either the motif or sequence samplers and that optimizes the information content of the motif models. In addition a non-parametric test was developed to test the statistical significance of motifs detected by any of the Gibbs methods. These methods have been implemented in the C programming language and the samplers were used to find motifs conserved among sets of distantly related proteins. This includes detection of a very subtle repetitive motif characteristic of certain beta-strands present in bacterial and mitochondrial porins. A search of bacterial proteins, using a procedure for scanning a sequence for internal repeats, revealed that the porin motif also occurs in other bacterial membrane proteins that are not known to be porins. The significance of the project lies in the development of fast and sensitive methods for detecting motifs in protein and nucleic acid sequences.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000056-02
Application #
3759324
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1994
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code