The prediction of the three dimensional structure of a globular protein from its amino acid sequence along with the mechanism by which protein folding occurs are among the most important unsolved problems of contemporary molecular biology. The overall objectives of this proposal are the continued development and refinement of algorithms which not only can predict protein tertiary structure using only sequence information as input but also may provide insights into the folding pathway. To achieve these goals, this proposal focuses on the lattice based aspects of a hierarchical approach to protein folding. High resolution lattice models of proteins, comprised of an alpha-carbon plus reduced off lattice, side chain description, will provide the overall folding pathways and folded conformations. The resulting folded lattice structures are estimated for the alpha-carbons to have a 2-4 angstroms rms deviation from the native state. Turning to the folding pathways, the predicted molten globule states and their free energy landscape will be characterized in detail. The factors responsible for side chain fixation on passage from the molten globule to the native state will be explored, with particular attention focused on the interplay of protein sequence and side chain packing. Specifically this proposal will address the following. (1). A new high coordination lattice model of proteins will be refined, different side chain realizations will be examined and the dynamic Monte Carlo algorithms parallelized. (2). Better empirical free energy functions will be developed. These include better methods for predicting the propensities for secondary structure and generalization of the hydrogen bond scheme to include backbone-side chain hydrogen bonds. To help eliminate misfolded structures, additional very robust knowledge based rules, such as the connections in supersecondary structural elements do not cross, will be included in the interaction scheme. Sequence specific tertiary interactions including a local burial turn, pair interactions and generalized cooperative multibody side chain contact templates will be self consistently derived in the presence of predicted secondary structure propensities. Then, a recently developed neural network which can recognize whether 7 by 7 subfragments of sidechain contact maps are protein like or not will be extended to include sequence specific preferences for subsequences to adopt specific patterns. This information will be obtained from a neural network trained on both homologous and non homologous subsequences that adopt these patterns. Thus, it should be general and not simply applicable to homologous sequence fragments. (3). The folding of representative motifs of globular proteins will be undertaken. Included are the helical proteins such as cytochrome c, whose predicted folding pathway will be compared to experiment, myohemerythrin, myoglobin and complement factor, 1c5a. The mixed motif proteins include ubiquitin, flavodoxin and PRA isomerase, and the beta-proteins include the 16th complement control protein of factor H, 1hcc, alpha-amylase, plastocyanin and retinol binding protein. (4). To validate the methodology, additional blind predictions of proteins whose structures are unknown will be undertaken. Likely candidates include rusticyanin and erythropoietin.
Showing the most recent 10 out of 121 publications