The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. This algorithm extends the previous work in this area (Lawrence, et. al. Science, 262:208-214, 1993) in three ways: 1) The requirement for the specification of the number of motifs in each sequence has been relaxed. 2) The length of the motif is now automatically determined by the algorithm. 3) A non parametric test for the significance of the alignment has been developed. When applied to sequences sharing a single motif, the sampler can be used to classify regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. This feature permits the algorithm to simultaneously align the sequences and classify segments into submodels. Other statistically-based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of thirty-two very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (altschul et. al., 1990) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane spanning beta-stands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000067-01
Application #
5203638
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
1995
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code