We have developed algorithms for comparison and alignment of protein three dimensional structures. VAST (vector alignment search tool) identifies substructure similarities by comparing the types, connectivity, and relative orientations of SSE's (secondary structure elements). Surprising similarities are identified objectively, by considering the number and scores of superimposable SSE-pairs in the best alignment, and the number of alternative alignments sampled. An optimal residue-by-residue alignments are also identified objectively, as that with the most surprising combination of superposition residual and number of aligned residues. Work this year has focused in three areas: 1) refinement of the rapid search heuristic, 2) refinement of the statistical significance calculation, and 3) calculation of a complete structural neighbor database for Entrez. VAST is an exhaustive search method in that it considers all possible SSE-pair alignments via a clique detection algorithm, and ranks them according to superposition score. We have found that sensitivity is improved to over 99.5% of BLAST similarities by two simple modifications, relaxation of the geometrical criteria defining edges in the clique graph, and prior parsing of 3D structure into compact domains. The significance test statistic for VAST is the product of the chance-occurrence likelihood of the best SSE alignment, and the number of possible alignments in a given domain-pair comparison. We have found that accuracy is improved by use of explicit convolution of the empirical score distribution for SSE pairs, up to substructure sizes found in practice, and by exact calculation of the number of alternative alignments, via a dynamic programming algorithm. The Entrez neighbor database contains results of an all-against-all comparison of the 10,000 domain structures in the current 3D database. We have found that VAST requires approximately .5 seconds per comparison, a value which makes this calculation possible for the first time. Maintenance of the complete structure neighbor database is also feasible, and we expect that this will be a useful resource for comparative analysis.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000057-03
Application #
2578631
Study Section
Special Emphasis Panel (CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
3
Fiscal Year
1996
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Shoemaker, Benjamin A; Panchenko, Anna R; Bryant, Stephen H (2006) Finding biologically relevant protein domain interactions: conserved binding mode analysis. Protein Sci 15:352-61
Panchenko, Anna R; Wolf, Yuri I; Panchenko, Larisa A et al. (2005) Evolutionary plasticity of protein families: coupling between sequence and structure variation. Proteins 61:535-44
Panchenko, Anna R; Madej, Thomas (2004) Analysis of protein homology by assessing the (dis)similarity in protein loop regions. Proteins 57:539-47
Chen, Jie; Anderson, John B; DeWeese-Scott, Carol et al. (2003) MMDB: Entrez's 3D-structure database. Nucleic Acids Res 31:474-7
Wang, Yanli; Anderson, John B; Chen, Jie et al. (2002) MMDB: Entrez's 3D-structure database. Nucleic Acids Res 30:249-52
Marchler-Bauer, Aron; Panchenko, Anna R; Ariel, Naomi et al. (2002) Comparison of sequence and structure alignments for protein domains. Proteins 48:439-46
Wang, Y; Addess, K J; Geer, L et al. (2000) MMDB: 3D structure data in Entrez. Nucleic Acids Res 28:243-5