We have developed computer methods to predict protein three-dimensional structure by recognition of similar """"""""folds"""""""" in the structural database. A protein's sequence is """"""""threaded"""""""" through alternative conformations, and those most compatible are identified by approximate free-energy calculation, using contact potentials. Research has focussed in three areas: 1) testing of the threading method with novel sequences, 2) development of confidence statistics for predictions, and 3) development of improved contact potentials. The """"""""adaptive"""""""" threading algorithm, based on block alignment and Gibb's sampling optimization, was tested in blind prediction experiments for the Asilomar workshop on this topic. In two of two predictions where the unknown was accurately represented in our """"""""core"""""""" database, the correct model was identified as one of the top 3, with accurate sequence-structure alignment. In two other predictions test statistics indicated poor confidence, and these proteins proved to have less extensive similarity to known folds, not accurately represented in the core database. Test statistics developed for these prediction experiments were based on score distributions for shuffled sequences, a procedure which allow us to rigorously correct for bias due to sequence composition and alignment -space size. Prediction experiments also suggested that modest improvement in potentials and core definition will prove important, since some predictions were just below the confidence threshold. For this reason we have begun development of new potentials, based on more detailed representation of contact-pair geometry, and more precise definitions of conserved core substructure. The significance of this research is that it may allow 3-dimensional modeling for sequences only distantly related to proteins of known structure, and thus suggest hypotheses as to their mechanism of action and function. Our threading prediction for the Obese gene product, for example, has suggested that leptin, implicated in hereditary obesity, is structurally and perhaps functionally similar to the helical cytokines.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000045-03
Application #
5203626
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
3
Fiscal Year
1995
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Chakrabarti, Saikat; Lanczycki, Christopher J; Panchenko, Anna R et al. (2006) Refining multiple sequence alignments with conserved core regions. Nucleic Acids Res 34:2598-606
Marchler-Bauer, Aron; Anderson, John B; Cherukuri, Praveen F et al. (2005) CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res 33:D192-6
Wheeler, David L; Barrett, Tanya; Benson, Dennis A et al. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 33:D39-45
Kann, Maricel G; Thiessen, Paul A; Panchenko, Anna R et al. (2005) A structure-based method for protein sequence alignment. Bioinformatics 21:1451-6
Panchenko, Anna R; Kondrashov, Fyodor; Bryant, Stephen (2004) Prediction of functional sites by analysis of sequence and structure conservation. Protein Sci 13:884-92
Panchenko, Anna R (2003) Finding weak similarities between proteins by sequence profile comparison. Nucleic Acids Res 31:683-9
Marchler-Bauer, Aron; Anderson, John B; DeWeese-Scott, Carol et al. (2003) CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res 31:383-7
Marchler-Bauer, Aron; Panchenko, Anna R; Ariel, Naomi et al. (2002) Comparison of sequence and structure alignments for protein domains. Proteins 48:439-46
Marchler-Bauer, Aron; Panchenko, Anna R; Shoemaker, Benjamin A et al. (2002) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res 30:281-3
Panchenko, Anna R; Bryant, Stephen H (2002) A comparison of position-specific score matrices based on sequence and structure alignments. Protein Sci 11:361-70

Showing the most recent 10 out of 15 publications