Computational prediction of protein structure from the amino acid sequence is one of the most important and challenging problems in bioinformatics and computational biology. With the exponential growth of protein sequences without solved protein structures in the post-genomic era, accurate protein structure prediction methods and tools are in urgent need. Here, we propose to develop an integrated approach to advance protein structure prediction at the 1-dimensional (1D), 2-dimensional (2D) and 3-dimensional (3D) levels. At the 1D level, novel information such as domain evolution signals, alternative gene splicing sites, and 2D protein contact map will be used to predict protein domain boundaries from the sequences. At the 2D level, new methods such as residue contact propagation, machine learning boosting, linear programming, and Markov Chain Monte Carlo simulations will be used to advance residue-residue contact prediction for a domain, or a protein. At the 3D level, 2D contact prediction, fold recognition via machine learning, and multi-template combination will be used to enhance both template-based and ab initio structure prediction. Finally, knowledge-based statistical machine learning methods and model combination algorithms will be developed to reliably evaluate and refine the quality of predicted protein structural models. One of several innovative aspects of this approach is to integrate 1D, 2D, and 3D predictions in order to improve each other through protein structural unit - domains. The 1D, 2D, and 3D protein structure prediction methods will be implemented as user-friendly software packages and web services released to the scientific community. These tools and web services will be useful for protein structure prediction, structure determination, functional analysis, protein engineering, protein mutagenesis analysis, and protein design.
The project will develop accurate computational methods and tools for basic biomedical research such as protein structure prediction, protein function analysis, protein design, protein engineering, and structure-based drug design.
|Li, Jilong; Bhattacharya, Debswapna; Cao, Renzhi et al. (2014) The MULTICOM protein tertiary structure prediction system. Methods Mol Biol 1137:29-41|
|Jo, Taeho; Cheng, Jianlin (2014) Improving protein fold recognition by random forest. BMC Bioinformatics 15 Suppl 11:S14|
|Deng, Xin; Cheng, Jianlin (2014) MSACompro: improving multiple protein sequence alignment by predicted structural features. Methods Mol Biol 1079:273-83|
|Cao, Renzhi; Wang, Zheng; Cheng, Jianlin (2014) Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment. BMC Struct Biol 14:13|
|Khoury, George A; Liwo, Adam; Khatib, Firas et al. (2014) WeFold: a coopetition for protein structure prediction. Proteins 82:1850-68|
|Deng, Xin; Cheng, Jianlin (2014) Enhancing HMM-based protein profile-profile alignment with structural features and evolutionary coupling information. BMC Bioinformatics 15:252|
|Cao, Renzhi; Wang, Zheng; Wang, Yiheng et al. (2014) SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics 15:120|
|Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen et al. (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10:221-7|
|Bhattacharya, Debswapna; Cheng, Jianlin (2013) 3Drefine: consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization. Proteins 81:119-31|
|Wang, Zheng; Cao, Renzhi; Cheng, Jianlin (2013) Three-level prediction of protein function by combining profile-sequence search, profile-profile search, and domain co-occurrence networks. BMC Bioinformatics 14 Suppl 3:S3|
Showing the most recent 10 out of 26 publications