We have developed computer methods to compare a protein's sequence with a library of """"""""folds"""""""" from the structural database. The sequence is """"""""threaded"""""""" through alternative structures, and those most compatible are identified by energy calculations, using contact potentials. Since they directly detect structural similarity, threading methods can identify very distant evolutionary relationships that may be undetectable by sequence comparison. Research this year has focused on testing of the core-element threading method, in blind predictions and control experiments, and on algorithmic improvements to increase sensitivity. Control experiments using known structures identified thresholds for successful fold recognition and accurate modeling: the similar """"""""core"""""""" substructure must comprise 60% or more of the protein and must superpose to a residual of 2.5 Angstroms or less, such that a large fraction of contacts are preserved. Analysis of predictions for the 1996 CASP2 workshop (Critical Assessment of Structure Prediction) confirmed this conclusion. Structural similarity can be less extensive in some cases of distant relationship, however, and several improvements to increase sensitivity have been considered. New definitions of the """"""""core"""""""" of database structures, according to the regions superimposable in homologs with known structures, has been show to reduce false negatives in threading predictions. Combination of contact potentials with sequence-motif scores was also shown to increases sensitivity in difficult recognition problems. Use of rigorous p-value calculations was shown to reduce false positives. With these improvements fold recognition may be expected to reliably detect a greater proportion of the distant evolutionary relationships. This has been demonstrated at the 1998 CASP3 workshop, where the NCBI team was awarded """"""""first place"""""""" in fold recognition, among over 90 international groups entering the competition. The threading methods developed in this project are now being applied to construction of a conserved domain database (CDD). Seed domain alignments, derived from sequence comparison, are mapped onto known 3D structures and compared to 3D structure alignments, to define a core-structure alignment for a sample of representative domains. These alignments are validated by threading calculations, and additional representative sequences detected by PSI-BLAST scanning are merged into the alignment by threading. CDD alignments serve as a protein classification system for public information retrieval services. Domains with conserved structure and function are easily identified, and visualization of the resulting sequence/structure alignments provides a detailed annotation of structure-function relationships.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000045-08
Application #
6432749
Study Section
(CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
8
Fiscal Year
2000
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Chakrabarti, Saikat; Lanczycki, Christopher J; Panchenko, Anna R et al. (2006) Refining multiple sequence alignments with conserved core regions. Nucleic Acids Res 34:2598-606
Marchler-Bauer, Aron; Anderson, John B; Cherukuri, Praveen F et al. (2005) CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res 33:D192-6
Wheeler, David L; Barrett, Tanya; Benson, Dennis A et al. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 33:D39-45
Kann, Maricel G; Thiessen, Paul A; Panchenko, Anna R et al. (2005) A structure-based method for protein sequence alignment. Bioinformatics 21:1451-6
Panchenko, Anna R; Kondrashov, Fyodor; Bryant, Stephen (2004) Prediction of functional sites by analysis of sequence and structure conservation. Protein Sci 13:884-92
Panchenko, Anna R (2003) Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res 31:683-9
Marchler-Bauer, Aron; Anderson, John B; DeWeese-Scott, Carol et al. (2003) CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res 31:383-7
Marchler-Bauer, Aron; Panchenko, Anna R; Ariel, Naomi et al. (2002) Comparison of sequence and structure alignments for protein domains. Proteins 48:439-46
Marchler-Bauer, Aron; Panchenko, Anna R; Shoemaker, Benjamin A et al. (2002) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res 30:281-3
Panchenko, Anna R; Bryant, Stephen H (2002) A comparison of position-specific score matrices based on sequence and structure alignments. Protein Sci 11:361-70

Showing the most recent 10 out of 15 publications