The Protein Structure Initiative aims to provide an atomic level structure for essentially any gene sequence. Experimental progress is occurring at multiple locations, providing a wide distribution of structural classes. In response to RFA GM-07-003, we propose to take advantage of previous successes and significantly improve the accuracy of comparative modeling methods for protein structure prediction. Our focus largely is on improving the homology-based predictions that emerge after the initial threading stage, whether sequence identity is high (>30%) or low (<10%). For both classes, Dr. Xu's successful Raptor threading program can generate starting models, but they often have suboptimal stereochemistry. Drs. Sosnick and Freed's folding algorithm will refine and rescore structures using a newly developed, extremely accurate statistical potential and a novel move set which largely solves the conformational sampling problem near the bottom of the native well without resorting to expensive computational molecular mechanics methods. The just-developed move set allows significant optimization of the backbone dihedral angles according to observed ?,f frequencies in the PDB and to interactions including H-bonding, while maintaining the overall backbone trace to within ~1 ? RMSD. Large angular movements are allowed, and the refinement of loops will benefit from our extremely successful strategy used to model denatured states. Targets include large membrane channel proteins (>300 residues).
In aim 2, our Ramachandran basin prediction algorithm, which often can predict the native basins with greater than 85% accuracy, will help identify the correct template for threading of low identity sequences. Recurring local motifs found in folding simulations and the new statistical potential also will be utilized in template identification. Across nearly every biological discipline, detailed knowledge of protein structure is critical to understanding the biological function. High resolution structural information often is the starting point for describing a human disease at the molecular level, and mechanistic studies thereon. The major advance outlined in this proposal will greatly expand the database of protein structures and impact studies in nearly every field of health-related studies. The proposed research involves a new, synergistic collaboration and novel computational techniques as indicated in the RFA.
Showing the most recent 10 out of 24 publications