Considerable progress in protein structure prediction has been witnessed by the recent community-wide CASP experiments. The state of the art algorithms, including I-TASSER and QUARK, can build structural models with correct folds for ~3/4 of single-domain protein targets, where template structures can be drawn closer to the native state in more than 80% of cases in the blind tests. Consequently, the highly efficient modeling systems have been widely used by the community for generating protein structure predictions to assist various biological structure and function studies. Nevertheless, computational models for the proteins that do not have homology templates generally have low accuracy and are of no practical use to most of the biomedical studies. For proteins with >150 residues, for instance, ab initio modeling has difficulty in constructing correct folds; this is particularly true for beta-proteins that have complicated topologies characterized by the long-range beta-strand contacts, while ab initio folding methods often have the models with simplified short-range beta-sheet paring that are different from the target. On the other hand, with the rapid accumulation of various sequences databases, the sequence-based contact predictions, in particular those coupled with co-evolution analysis from multiple sequence alignments, emerge to demonstrate usefulness in assisting ab initio protein folding. The success, as shown in CASP, is however anecdotal due to the requirement of the contact prediction methods for the high volume of sequence homologs that are not available to most of the non-humongous protein targets. This renewal proposal seeks to extend the development of the I-TASSER-based algorithms to high-resolution structure prediction, with the focus on improving the quality of non- and distant-homology modeling of medium-to-large size sequences (>150 residues), including particularly the beta-proteins, where a new ensemble-based protocol is proposed to incorporate evolutionary pressure into Monte Carlo simulations for enhancing the convergence and efficiency of structural folding. It will also systematically explore and examine the potential of using sequence-based contact map predictions for improving ab initio structure folding, especially for the proteins that do not have a high-volume of sequence or structural homologs. Built on the strength of the well-established I-TASSER and QUARK methods, this project aims to significantly improve the state of the art of tertiary protein structure prediction, especially for the distantly-homologous proteins, so that the computer-based structure models can be of practical use to drug screening and biochemical functional inference for the majority of proteins in genomes, and therefore enhance the impact of structural bioinformatics on biology and medicine.

Public Health Relevance

In contemporary drug discovery industry, researchers need to use detailed knowledge of 3-dimensional structure of proteins for designing synthetic compounds to fight against various human diseases. But the structures of many important proteins are not available from experimental solutions. Computational algorithms developed in this project aim to predict accurate atomic-level protein structures that can be used for putative compound screening and therefore have a broad impact on drug discovery and human health.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM083107-13
Application #
9731609
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Lyster, Peter
Project Start
2008-04-01
Project End
2020-06-30
Budget Start
2019-07-01
Budget End
2020-06-30
Support Year
13
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109
Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L et al. (2018) MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping. J Mol Biol 430:2256-2265
Diamond, Justin S; Zhang, Yang (2018) THE-DB: a threading model database for comparative protein structure analysis of the E. coli K12 and human proteomes. Database (Oxford) 2018:
Wu, Jiansheng; Zhang, Qiuming; Wu, Weijian et al. (2018) WDL-RF: predicting bioactivities of ligand molecules acting with G protein-coupled receptors by combining weighted deep learning and random forest. Bioinformatics 34:2271-2282
Xia, Chun-Qiu; Han, Ke; Qi, Yong et al. (2018) A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data. IEEE/ACM Trans Comput Biol Bioinform 15:1315-1324
Virtanen, Jouko J; Zhang, Yang (2018) MR-REX: molecular replacement by cooperative conformational search and occupancy optimization on low-accuracy protein models. Acta Crystallogr D Struct Biol 74:606-620
Hu, Jun; Li, Yang; Zhang, Yang et al. (2018) ATPbind: Accurate Protein-ATP Binding Site Prediction by Combining Sequence-Profiling and Structure-Based Comparisons. J Chem Inf Model 58:501-510
Dong, Runze; Peng, Zhenling; Zhang, Yang et al. (2018) mTM-align: an algorithm for fast and accurate multiple protein structure alignment. Bioinformatics 34:1719-1725
Hu, Jun; Liu, Zi; Yu, Dong-Jun et al. (2018) LS-align: an atom-level, flexible ligand structural alignment algorithm for high-throughput virtual screening. Bioinformatics 34:2209-2218
Keasar, Chen; McGuffin, Liam J; Wallner, Björn et al. (2018) An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12. Sci Rep 8:9939
Zhang, Chengxin; Mortuza, S M; He, Baoji et al. (2018) Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins 86 Suppl 1:136-151

Showing the most recent 10 out of 102 publications