Recent CASP experiments have witnessed considerable progress in protein structure prediction. The state of the art algorithms, including I TASSER, can build models of correct fold for ~3/4 of single-domain protein targets, where template models can be driven closer to the native state in more than 80% of cases. As a consequence, the highly efficient protein structure modeling systems have been widely used by the biological and medical communities. Nevertheless, the accuracy of computational models for the proteins of distant-homology templates is usually low, which are of no practical use to most of biomedical studies. For proteins of >150 residues, ab initio modeling cannot successfully construct the correct fold. This project extends the development of the I-TASSER-based algorithms for high-resolution protein structure predictions, with the focus on improving the ability of distant-homology modeling and ab initio folding for large-size proteins. It also sees to increase the modeling accuracy by the aid of sparse and easily accessible experiment data including small-angle X-ray scattering. Built on the strength of the well-established I-TASSER and QUARK methods, the project aims to significantly improving the state of the art of tertiary protein structure prediction, especially for the non- and distant-homology proteins, so that the computational structure prediction can be of real use to modern drug screening and biochemical functional inference for the majority of proteins in genomes.

Public Health Relevance

In the contemporary drug discovery industry, scientists need to use detailed knowledge of 3-dimensional structure of proteins associated with particular diseases to design synthetic compounds that fight against the diseases. But the structures of many important proteins are not available from experimental solutions. The development of computer algorithms by this project, which are able to generate atomic protein structures, will speed up the screening of putative chemical compounds and result in significant impact on drug discovery and public health.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Wehrle, Janna P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Michigan Ann Arbor
Biostatistics & Other Math Sci
Schools of Medicine
Ann Arbor
United States
Zip Code
Quan, Shu; Wang, Lili; Petrotchenko, Evgeniy V et al. (2014) Super Spy variants implicate flexibility in chaperone action. Elife 3:e01584
Szilagyi, Andras; Zhang, Yang (2014) Template-based structure modeling of protein-protein interactions. Curr Opin Struct Biol 24:10-23
Zhang, Yang (2014) Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10. Proteins 82 Suppl 2:175-87
Yang, Jianyi; Roy, Ambrish; Zhang, Yang (2013) BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res 41:D1096-103
Xue, Zhidong; Xu, Dong; Wang, Yan et al. (2013) ThreaDom: extracting protein domain boundary information from multiple threading alignments. Bioinformatics 29:i247-56
Mitra, Pralay; Shultis, David; Zhang, Yang (2013) EvoDesign: De novo protein design based on structural and evolutionary profiles. Nucleic Acids Res 41:W273-80
Xu, Dong; Zhang, Yang (2013) Toward optimal fragment generations for ab initio protein structure assembly. Proteins 81:229-39
Xu, Dong; Li, Hua; Zhang, Yang (2013) Protein depth calculation and the use for improving accuracy of protein fold recognition. J Comput Biol 20:805-16
Fan, Yong-Xian; Zhang, Yang; Shen, Hong-Bin (2013) LabCaS: labeling calpain substrate cleavage sites from amino acid sequence using conditional random fields. Proteins 81:622-34
Mitra, Pralay; Shultis, David; Brender, Jeffrey R et al. (2013) An evolution-based approach to De Novo protein design and case study on Mycobacterium tuberculosis. PLoS Comput Biol 9:e1003298

Showing the most recent 10 out of 51 publications