With the genome sequencing projects providing a deluge of sequences, to utilize this information requires knowledge of the function of all the proteins in a given genome. Because biochemical function is determined by a protein's active site structure, protein structures are becoming essential tools for genome functional annotation. This has spurred structural genomics approaches that aim to develop high-throughput protein structure determination methods. Structure prediction is important not only for target selection in structural genomics but also for genome scale functional prediction. The goal of this project is to improve our TOUCHSTONE tertiary structure prediction algorithm that employs predicted secondary and tertiary restraints by addressing the key issues in protein folding: the lack of potentials that recognize the native state from the myriad of protein like, misfolded structures and the lack effective search conformational algorithms, especially for proteins over 150 residues.
The Specific Aims designed to address these problems are: (1) The comprehensive native structure prediction of a representative set of all proteins with solved structures less than 201 residues in length (no pair with more than 35% sequence identity; at present 2282 proteins) will be done. This comprehensive test, made possible due to our new 4,000 processor PC cluster, will establish the full range of validity of TOUCHSTONE and provide a benchmark against which subsequent improvements can be assessed. (2). The model will be reparameterized based on the results from and use of the comprehensive set of decoys provided by (1). Each term in the potential will be examined, relative weights adjusted and where necessary extended or rederived. Due to the large number of native and decoy structures, the greatly improved statistics will allow for the derivation of terms in the potential, (e.g. 3-body secondary structure dependent pair potentials) that was not previously possible. (3). Improved protocols to predict the tertiary restraints that permit the folding of complex topologies will be developed. (4). Methods to select native like structures will be improved, e.g. by a series of simulations where previously generated clustered structures are used to derive restraints for subsequent simulations. (4). The prediction of tertiary structure of all small (<201 residues) proteins in the M. genitalium, E. coli, S. cerevisiae, D.. melanogaster C. elegans, and human genomes will be done. (5). The algorithm will be the object of continual independent testing including participation in future CASPs. The overall goal is to range of validity of our ab initio folding algorithms and to provide significant improvements in the state of the art of tertiary structure prediction.
Showing the most recent 10 out of 121 publications