Computational prediction of protein structure from the amino acid sequence is one of the most important and challenging problems in bioinformatics and computational biology. With the exponential growth of protein sequences without solved protein structures in the post-genomic era, accurate protein structure prediction methods and tools are in urgent need. Here, we propose to develop an integrated approach to advance protein structure prediction at the 1-dimensional (1D), 2-dimensional (2D) and 3-dimensional (3D) levels. At the 1D level, novel information such as domain evolution signals, alternative gene splicing sites, and 2D protein contact map will be used to predict protein domain boundaries from the sequences. At the 2D level, new methods such as residue contact propagation, machine learning boosting, linear programming, and Markov Chain Monte Carlo simulations will be used to advance residue-residue contact prediction for a domain, or a protein. At the 3D level, 2D contact prediction, fold recognition via machine learning, and multi-template combination will be used to enhance both template-based and ab initio structure prediction. Finally, knowledge-based statistical machine learning methods and model combination algorithms will be developed to reliably evaluate and refine the quality of predicted protein structural models. One of several innovative aspects of this approach is to integrate 1D, 2D, and 3D predictions in order to improve each other through protein structural unit - domains. The 1D, 2D, and 3D protein structure prediction methods will be implemented as user-friendly software packages and web services released to the scientific community. These tools and web services will be useful for protein structure prediction, structure determination, functional analysis, protein engineering, protein mutagenesis analysis, and protein design.

Public Health Relevance

The project will develop accurate computational methods and tools for basic biomedical research such as protein structure prediction, protein function analysis, protein design, protein engineering, and structure-based drug design.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Missouri-Columbia
Biostatistics & Other Math Sci
Schools of Engineering
United States
Zip Code
Li, Jilong; Bhattacharya, Debswapna; Cao, Renzhi et al. (2014) The MULTICOM protein tertiary structure prediction system. Methods Mol Biol 1137:29-41
Jo, Taeho; Cheng, Jianlin (2014) Improving protein fold recognition by random forest. BMC Bioinformatics 15 Suppl 11:S14
Deng, Xin; Cheng, Jianlin (2014) MSACompro: improving multiple protein sequence alignment by predicted structural features. Methods Mol Biol 1079:273-83
Cao, Renzhi; Wang, Zheng; Cheng, Jianlin (2014) Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment. BMC Struct Biol 14:13
Khoury, George A; Liwo, Adam; Khatib, Firas et al. (2014) WeFold: a coopetition for protein structure prediction. Proteins 82:1850-68
Deng, Xin; Cheng, Jianlin (2014) Enhancing HMM-based protein profile-profile alignment with structural features and evolutionary coupling information. BMC Bioinformatics 15:252
Cao, Renzhi; Wang, Zheng; Wang, Yiheng et al. (2014) SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics 15:120
Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen et al. (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10:221-7
Bhattacharya, Debswapna; Cheng, Jianlin (2013) 3Drefine: consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization. Proteins 81:119-31
Wang, Zheng; Cao, Renzhi; Cheng, Jianlin (2013) Three-level prediction of protein function by combining profile-sequence search, profile-profile search, and domain co-occurrence networks. BMC Bioinformatics 14 Suppl 3:S3

Showing the most recent 10 out of 26 publications