Recent CASP experiments have witnessed considerable progress in protein structure prediction. The state of the art algorithms, including I TASSER, can build models of correct fold for ~3/4 of single-domain protein targets, where template models can be driven closer to the native state in more than 80% of cases. As a consequence, the highly efficient protein structure modeling systems have been widely used by the biological and medical communities. Nevertheless, the accuracy of computational models for the proteins of distant-homology templates is usually low, which are of no practical use to most of biomedical studies. For proteins of >150 residues, ab initio modeling cannot successfully construct the correct fold. This project extends the development of the I-TASSER-based algorithms for high-resolution protein structure predictions, with the focus on improving the ability of distant-homology modeling and ab initio folding for large-size proteins. It also sees to increase the modeling accuracy by the aid of sparse and easily accessible experiment data including small-angle X-ray scattering. Built on the strength of the well-established I-TASSER and QUARK methods, the project aims to significantly improving the state of the art of tertiary protein structure prediction, especially for the non- and distant-homology proteins, so that the computational structure prediction can be of real use to modern drug screening and biochemical functional inference for the majority of proteins in genomes.

Public Health Relevance

In the contemporary drug discovery industry, scientists need to use detailed knowledge of 3-dimensional structure of proteins associated with particular diseases to design synthetic compounds that fight against the diseases. But the structures of many important proteins are not available from experimental solutions. The development of computer algorithms by this project, which are able to generate atomic protein structures, will speed up the screening of putative chemical compounds and result in significant impact on drug discovery and public health.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Wehrle, Janna P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Michigan Ann Arbor
Biostatistics & Other Math Sci
Schools of Medicine
Ann Arbor
United States
Zip Code
Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L et al. (2018) MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping. J Mol Biol 430:2256-2265
Diamond, Justin S; Zhang, Yang (2018) THE-DB: a threading model database for comparative protein structure analysis of the E. coli K12 and human proteomes. Database (Oxford) 2018:
Wu, Jiansheng; Zhang, Qiuming; Wu, Weijian et al. (2018) WDL-RF: predicting bioactivities of ligand molecules acting with G protein-coupled receptors by combining weighted deep learning and random forest. Bioinformatics 34:2271-2282
Xia, Chun-Qiu; Han, Ke; Qi, Yong et al. (2018) A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data. IEEE/ACM Trans Comput Biol Bioinform 15:1315-1324
Virtanen, Jouko J; Zhang, Yang (2018) MR-REX: molecular replacement by cooperative conformational search and occupancy optimization on low-accuracy protein models. Acta Crystallogr D Struct Biol 74:606-620
Hu, Jun; Li, Yang; Zhang, Yang et al. (2018) ATPbind: Accurate Protein-ATP Binding Site Prediction by Combining Sequence-Profiling and Structure-Based Comparisons. J Chem Inf Model 58:501-510
Dong, Runze; Peng, Zhenling; Zhang, Yang et al. (2018) mTM-align: an algorithm for fast and accurate multiple protein structure alignment. Bioinformatics 34:1719-1725
Hu, Jun; Liu, Zi; Yu, Dong-Jun et al. (2018) LS-align: an atom-level, flexible ligand structural alignment algorithm for high-throughput virtual screening. Bioinformatics 34:2209-2218
Keasar, Chen; McGuffin, Liam J; Wallner, Björn et al. (2018) An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12. Sci Rep 8:9939
Zhang, Chengxin; Mortuza, S M; He, Baoji et al. (2018) Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins 86 Suppl 1:136-151

Showing the most recent 10 out of 102 publications