The detection and alignment of locally conserved regions in multiple sequences can provide insight into protein structure, function and evolution. A Gibbs sampling algorithm has been shown useful for multiple sequence alignment when the relationship between the sequences is subtle. However, when the sequences understudy contain many common collinear motifs the sampler can experience difficulty in convergence. In this project we seek to develop a version of the algorithm which overcomes this deficiency. The algorithm is a backwards forwards algorithm that takes advantage of a recursive property similar to the one commonly used by dynamic programming algorithms developed for the alignment of pairs of sequences. The algorithm has two steps: 1) On the forward step the recursive relationship is employed to obtain the probability of the joint distribution of the location of all of the collinear motifs in the sequence by summing of all partial assignments. 2) The backward step begins with the marginal distribution of the location of the last motifs and samples backward through the sequence to sample a realizations from the joint distribution. This backwards/forward algorithm is applied to each of the sequences in the set taking as given the sampled alignment in each of the other sequences. In sequence alignment the assignment of gap penalties is always difficult. In this project we seek to develop a statistical method for determining gap penalties and for assessing the statistical significance to alternative gaping models. These statistically based gap penalties are based on the premise that under the null all alignments are equally likely. Consequently more flexible alignment, eg. those that permit more gaps, must be down weighted to account for the access number alignment that emerge from the alignment conditions are relaxed. We also are exploring the application of these methods for the detection of subtlely related sequences in a database.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000058-02
Application #
5203630
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1995
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code