This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Structural genomic programs aim to place all protein sequences within 'modeling distance' of a known structure. To take full advantage of this explosion of information, protein structure predictions based on template structures will need to be improved. In the course of examining protein homology modeling procedures of high-ranking groups from Round 5 of the Critical Assessment of Structure Prediction of Proteins (CASP5) as well as some procedures of our own, we recognized some methods that may advance this particular area of protein structure prediction. As the sequence identity between a protein sequence with unknown structure (the target) and the protein sequence with known structure (the template) decreases, the performance of sequence alignment methods between target and template also decreases. This decrease is particularly notable when the sequence identity goes below ~35%. If a correct sequence alignment could be achieved, then considerably better protein structural models could be constructed. These better models make it more likely that they can then be refined with physics-based models, either through low-resolution lattice based models, all-atom molecular mechanical (MM) models or both. To achieve these goals, we first chose to construct a structure prediction 'pipeline' that required a minimal amount of steps. This would insure that each step can later be examined to determine ways to advance the method. The flowchart shown below reveal the methods we will using and testing this summer in CASP6. The program 'probA' is used to generate alternative alignments. Typically we will output 100-500 alternative alignments. From these alignments, structural models are constructed with the program MODELLER version 6.2. These structural models are then assessed with the statistical potentials present in ProsaII and possibly others. Perl scripts have been written to enable the flow between programs. The models with the lowest pair interaction energy are then visually examined with VMD. Loop regions in the model that are not built from the template or other poorly formed secondary structure elements will then be refined using enhanced sampling methods with the MMTSB toolset. (A workshop on protein structure prediction using the toolset was held at the PSC in the summer of 2003.) These enhanced sampling methods include simulated annealing of an ensemble of initial structures or replica-exchange simulations. The energies of the resulting structures will then be calculated with the MM-Generalized Born potential and clustered to determine structures closest to the native. Initial results are available at www.psc.edu/biomed/research/biostr. As our expertise increases with these software packages, we will begin automating this process with an eye towards developing a user gateway to allow the general research community to quickly and effectively develop reasonable homology-based molecular models for their systems. While there are currently a number of sites offering these services, the results from prior CASP experiments have clearly shown that the quality of the multiple sequence alignment coupled with the identification of suboptimal alignments leads to superior results. Thus, we expect this pipeline to be effective.
Showing the most recent 10 out of 292 publications