The large and growing databases of known DNA sequences represent a knowledge base with the power to revolutionize biology, biochemistry, and biotechnology. Without a knowledge of a protein's folded conformation, however, it is difficult to answer the most basic questions of what the protein does and how it does it, let alone develop a rational approach to drug design. While the determination of the protein sequence is relatively straightforward, the experimental determination of the protein structure using X-ray crystallography or multidimensional NMR, is complicated, time consuming, and uncertain. In spite of decades of theoretical, computational, and experimental effort, we can neither understand how the protein is able to find its final folded state, or predict a priori what this final state will be. Even partial successes on these endeavors, such as the prediction of a limited set of biologically important proteins, would be highly significant. Energy functions will be developed and optimized for the prediction of the structure of a set of training proteins, using a criterion based on a Bayesian analysis. Generalization to proteins not included in the training set will allow the prediction of tertiary structure of proteins of biochemical interest, such as the VHR protein, a member of the family of protein phosphatases with dual specificity for tyrosine and serine. As the Bayesian optimization strategy is a general approach, the use of optimized energy functions represents a powerful and flexible way to combine traditional physicochemical interactions with other interactions that may not have a purely physicochemical interpretation, such as those based on information obtained through the analysis of evolutionary patterns or experimental observations, and those whose purpose it is to restrict the conformation space to be searched. The Bayesian approach will also be used to address an important but conceptually simpler problem, the cost function for the optimal alignment of two sequences, providing a testbed for exploring optimization strategies. By altering the energetics and dynamics, it will be possible to explore various models of protein folding, in an attempt to ascertain the circumstances under which various behavior is observed, and what consequences of the models might be experimentally observable.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
First Independent Research Support & Transition (FIRST) Awards (R29)
Project #
5R29LM005770-03
Application #
2392816
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Project Start
1995-04-01
Project End
2000-03-31
Budget Start
1997-04-01
Budget End
1998-03-31
Support Year
3
Fiscal Year
1997
Total Cost
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Chemistry
Type
Schools of Arts and Sciences
DUNS #
791277940
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109
Koshi, J M; Goldstein, R A (1996) Probabilistic reconstruction of ancestral protein sequences. J Mol Evol 42:313-20
Govindarajan, S; Goldstein, R A (1996) Why are some proteins structures so common? Proc Natl Acad Sci U S A 93:3341-5
Thompson, M J; Goldstein, R A (1996) Constructing amino acid residue substitution classes maximally indicative of local protein structure. Proteins 25:28-37
Thompson, M J; Goldstein, R A (1996) Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins 25:38-47
Koshi, J M; Goldstein, R A (1995) Context-dependent optimal substitution matrices. Protein Eng 8:641-5