Considerable progress in protein structure prediction has been witnessed by the recent community-wide CASP experiments. The state of the art algorithms, including I-TASSER and QUARK, can build structural models with correct folds for ~3/4 of single-domain protein targets, where template structures can be drawn closer to the native state in more than 80% of cases in the blind tests. Consequently, the highly efficient modeling systems have been widely used by the community for generating protein structure predictions to assist various biological structure and function studies. Nevertheless, computational models for the proteins that do not have homology templates generally have low accuracy and are of no practical use to most of the biomedical studies. For proteins with >150 residues, for instance, ab initio modeling has difficulty in constructing correct folds; this is particularly true for beta-proteins that have complicated topologies characterized by the long-range beta-strand contacts, while ab initio folding methods often have the models with simplified short-range beta-sheet paring that are different from the target. On the other hand, with the rapid accumulation of various sequences databases, the sequence-based contact predictions, in particular those coupled with co-evolution analysis from multiple sequence alignments, emerge to demonstrate usefulness in assisting ab initio protein folding. The success, as shown in CASP, is however anecdotal due to the requirement of the contact prediction methods for the high volume of sequence homologs that are not available to most of the non-humongous protein targets. This renewal proposal seeks to extend the development of the I-TASSER-based algorithms to high-resolution structure prediction, with the focus on improving the quality of non- and distant-homology modeling of medium-to-large size sequences (>150 residues), including particularly the beta-proteins, where a new ensemble-based protocol is proposed to incorporate evolutionary pressure into Monte Carlo simulations for enhancing the convergence and efficiency of structural folding. It will also systematically explore and examine the potential of using sequence-based contact map predictions for improving ab initio structure folding, especially for the proteins that do not have a high-volume of sequence or structural homologs. Built on the strength of the well-established I-TASSER and QUARK methods, this project aims to significantly improve the state of the art of tertiary protein structure prediction, especially for the distantly-homologous proteins, so that the computer-based structure models can be of practical use to drug screening and biochemical functional inference for the majority of proteins in genomes, and therefore enhance the impact of structural bioinformatics on biology and medicine.
In contemporary drug discovery industry, researchers need to use detailed knowledge of 3-dimensional structure of proteins for designing synthetic compounds to fight against various human diseases. But the structures of many important proteins are not available from experimental solutions. Computational algorithms developed in this project aim to predict accurate atomic-level protein structures that can be used for putative compound screening and therefore have a broad impact on drug discovery and human health.
Showing the most recent 10 out of 102 publications