The goal of this project is to develop algorithms for optimally aligning amino acid sequences with protein folding motifs. Folding motifs are alternative polypeptide backbone conformations, represented in the computer as lists of pairwise residue contacts. Alignments of sequence and folding motif are scored using residue contact potentials, or empirical free energy functions. Optimal alignments may be found by enumeration of all possibilities, but we wish to develop rapid methods suitable for search of a conformer data base. We have developed statistics which adjust for the effects of amino acid composition and sequence length on expected alignment scores. These provide a quantitative basis for ranking alternative alignments. We have also found thatpairwise contact potentials may be partitioned into linear and quadratic terms, corresponding to the hydrophobic and pairwise components of residue contact potentials. The hydrophobic term is particularly important in alignment scores, contributing roughly 2/3 of the information. Using exhaustive enumeration we have also found that scores from linear and quadratic components are highly correlated. Together, these observations suggest that optimal alignments may be identified rapidly by implicit enumeration. We may use dynamic programming algorithms to rapidly rank alternative alignments with respect to hydrophobic complementarity, and then compute pairwise contact complementarity for only thebest alignments identified. This project may lead to a new and practical method for protein structure prediction, prediction by recognition of folding motif. It is applicable to sequences showing little or no homology with other proteins. Motif recognition may thus detect distant evolutionary relationships, where sequence similarity is low, and may and extend possibilities for molecular modeling.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000007-01
Application #
3845098
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
1992
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code