Computational prediction of protein structure from the amino acid sequence is one of the most important and challenging problems in bioinformatics and computational biology. With the exponential growth of protein sequences without solved protein structures in the post-genomic era, accurate protein structure prediction methods and tools are in urgent need. Here, we propose to develop an integrated approach to advance protein structure prediction at the 1-dimensional (1D), 2-dimensional (2D) and 3-dimensional (3D) levels. At the 1D level, novel information such as domain evolution signals, alternative gene splicing sites, and 2D protein contact map will be used to predict protein domain boundaries from the sequences. At the 2D level, new methods such as residue contact propagation, machine learning boosting, linear programming, and Markov Chain Monte Carlo simulations will be used to advance residue-residue contact prediction for a domain, or a protein. At the 3D level, 2D contact prediction, fold recognition via machine learning, and multi-template combination will be used to enhance both template-based and ab initio structure prediction. Finally, knowledge-based statistical machine learning methods and model combination algorithms will be developed to reliably evaluate and refine the quality of predicted protein structural models. One of several innovative aspects of this approach is to integrate 1D, 2D, and 3D predictions in order to improve each other through protein structural unit - domains. The 1D, 2D, and 3D protein structure prediction methods will be implemented as user-friendly software packages and web services released to the scientific community. These tools and web services will be useful for protein structure prediction, structure determination, functional analysis, protein engineering, protein mutagenesis analysis, and protein design.

Public Health Relevance

The project will develop accurate computational methods and tools for basic biomedical research such as protein structure prediction, protein function analysis, protein design, protein engineering, and structure-based drug design.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Missouri-Columbia
Biostatistics & Other Math Sci
Schools of Engineering
United States
Zip Code
Adhikari, Badri; Nowotny, Jackson; Bhattacharya, Debswapna et al. (2016) ConEVA: a toolbox for comprehensive assessment of protein contacts. BMC Bioinformatics 17:517
Hou, Jie; Acharya, Lipi; Zhu, Dongxiao et al. (2016) An overview of bioinformatics methods for modeling biological pathways in yeast. Brief Funct Genomics 15:95-108
Cao, Renzhi; Cheng, Jianlin (2016) Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. Methods 93:84-91
Lensink, Marc F; Velankar, Sameer; Kryshtafovych, Andriy et al. (2016) Prediction of homoprotein and heteroprotein complexes by protein docking and template-based modeling: A CASP-CAPRI experiment. Proteins 84 Suppl 1:323-48
Cao, Renzhi; Cheng, Jianlin (2016) Protein single-model quality assessment by feature-based probability density functions. Sci Rep 6:23990
Adhikari, Badri; Cheng, Jianlin (2016) Protein Residue Contacts and Prediction Methods. Methods Mol Biol 1415:463-76
Cao, Renzhi; Bhattacharya, Debswapna; Hou, Jie et al. (2016) DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics 17:495
Li, Jilong; Cheng, Jianlin (2016) A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling. Sci Rep 6:25687
Adhikari, Badri; Trieu, Tuan; Cheng, Jianlin (2016) Chromosome3D: reconstructing three-dimensional chromosomal structures from Hi-C interaction frequency data using distance geometry simulated annealing. BMC Genomics 17:886
Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri et al. (2016) Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11. Proteins 84 Suppl 1:247-59

Showing the most recent 10 out of 62 publications