We have developed algorithms for comparison of protein three dimensional structures. The """"""""VAST"""""""" algorithm, for """"""""vector alignment search tool,"""""""" identifies substructures similarities rapidly by comparing the types, connectivity, and relative orientations secondary structure elements. Work has focused in three areas: 1) definition of the threshold similarity statistic, 2) validation of sensitivity and specificity, and 3) development of statistical criteria for """"""""optimal"""""""" residue-by-residue alignment. VAST ranks substructure similarities by chance-occurrence likelihood. This is calculated as the product of the probabilities that independent element-pair superposition residuals would be observed by chance, as determined by reference to an empirical distribution based on random draws of element pairs in the structural database. A threshold statistic is given as the product of the greatest substructure likelihood and the number of possible substructure choices, as determined by a combinatorial formula involving the number of discrete elements present in each protein. If small, this value is interpretable directly as a test statistic or p-value for significant similarity. VAST has been tested by an all-against-all analysis of known structures. Sensitivity with respect to BLAST """"""""hits"""""""" is 99.5%. Specificity in test cases from the literature appears perfect, and comparison to existing non- statistical methods indicates fewer hits, with omissions being low-score marginal similarities. Residue-residue alignments are now constructed by a Gibb's sampling scheme, which proved more efficient than Metropolis sampling. Significance of alternative residue alignments is evaluated by reference to an empirical distribution of superposition residuals in random alignments, to identify the most surprising one. An analytical density function fitted to empirical random-element self comparison residuals proved superior to a substructure-specific null model in reproducing literature expert alignments, and is in addition faster. The significance of this work will be in extending the horizon to which evolutionary relationship may be detected, by employing structure comparison in place of sequence comparison alone. The results have been made available to biologists as """"""""structural neighbors"""""""" in the """"""""Entrez"""""""" browser.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000057-02
Application #
5203629
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1995
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Shoemaker, Benjamin A; Panchenko, Anna R; Bryant, Stephen H (2006) Finding biologically relevant protein domain interactions: conserved binding mode analysis. Protein Sci 15:352-61
Panchenko, Anna R; Wolf, Yuri I; Panchenko, Larisa A et al. (2005) Evolutionary plasticity of protein families: coupling between sequence and structure variation. Proteins 61:535-44
Panchenko, Anna R; Madej, Thomas (2004) Analysis of protein homology by assessing the (dis)similarity in protein loop regions. Proteins 57:539-47
Chen, Jie; Anderson, John B; DeWeese-Scott, Carol et al. (2003) MMDB: Entrez's 3D-structure database. Nucleic Acids Res 31:474-7
Wang, Yanli; Anderson, John B; Chen, Jie et al. (2002) MMDB: Entrez's 3D-structure database. Nucleic Acids Res 30:249-52
Marchler-Bauer, Aron; Panchenko, Anna R; Ariel, Naomi et al. (2002) Comparison of sequence and structure alignments for protein domains. Proteins 48:439-46
Wang, Y; Addess, K J; Geer, L et al. (2000) MMDB: 3D structure data in Entrez. Nucleic Acids Res 28:243-5