This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Computational methods for determining protein structure are essential to the genome project due to the huge number of new sequences for proteins whose properties are completely unknown. A major limitation to current protein structure prediction algorithms is the inadequate quality of the input secondary structure predictions produced by the two major secondary strcture servers. These servers use machine learning methods that extensively rely on sequence similarity (called homology) and, hence, use sequence local information despite the knowledge that tertiary context often influences the secondary structure. We have devised a Monte Carlo simulated annealing algorithm and a corresponding set of computer codes to implement a novel scheme in which both secondary and tertiary structure are predicted in a self-consistent bootstrap fashion without the use of homology information. Tests made using a teragrid development grant demonstrate that our method outperforms the leading servers in secondary structure prediction and provides comparable tertiary structures to the best methods (using two orders of magnitude less computer time!) for small (less than 120 residues) single domain proteins. This proposal seeks to improve and extend our predictive methods as well as increase their computational efficiency. Proposed projects include the use of sequence similarity to improve our move set, the introduction of dynamic criteria for secondary structure assignment that vary depending of the fraction of structure previously assigned during the simulations, the improvement of the energy function to enhance the predictive quality and enable treating larger proteins, etc. Extensive applications will consider a wide range of proteins with unusual or difficult tertiary structures.
Showing the most recent 10 out of 292 publications