Proteins and their interactions play fundamental roles in all biological processes. Accurate description of protein structure and interactions is a fundamental step towards understanding biological life and highly relevant in the development of therapeutics and drugs. However, there is a large gap between the number of available protein sequences and the number of proteins (complexes) with solved structures and accurate interaction description, which has to be filled by computational prediction. The long-term goal of this project is to apply statistical machine learning and optimization algorithms to understand protein sequence-structure-function relationship by analyzing low- and high-throughput sequence, structure and functional data and to develop algorithms for structure and functional prediction. Our hypothesis is that by developing sophisticated algorithms to take advantage of the growing sequence and structure data, we can model sequence-structure relationship much more accurately and significantly improve structure and functional prediction, in particular for this proposal, residue (atomic) interaction strength prediction and remote homology detection. This project has produced a few CASP-winning, widely-used data-driven algorithms and web server ( for monomer protein modeling. This renewal will not only further develop machine learning algorithms (especially Deep Learning and probabilistic graphical models) for monomer proteins, but also branch out to protein interactions (complexes).
The specific aims are: (1) develop novel structure learning algorithms to predict inter-reside contacts and coevolved residues; (2) develop context-specific, coevolution-based, and distance-dependent statistical potentials using a new machine learning model called Deep Conditional (Markov) Neural Fields (DeepCNF); (3) develop Markov Random Fields (MRF) and DeepCNF methods for remote protein (interface/complex) homology detection and fold recognition to make use of long-range residue interaction predicted by the first two aims. This renewal will lead to further understanding and new models of protein sequence-structure-function relationship and yield publicly available software and servers for automated, accurate, quantitative analysis for a wide range of proteins and their interactions. The impact will be multiplied by tens of thousands of worldwide users employing the resulting software/servers to study a wide variety of proteins and interactions relevant to basic biological research and human diseases, in both low- and high-throughput experiments.

Public Health Relevance

Proteins and their interactions play fundamental roles in all biological processes including the maintenance of cellular integrity, metabolism, transcription/translation, and cell-cell communication. This proposal develops algorithms to understand protein sequence-structure-function relationship and to predict protein structures and interactions. The results will lead to a broad range of biomedical applications, such as better understanding of disease processes, development of novel diagnostics and drugs, and improved preventive therapies leading to reduced health care costs.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Krepkiy, Dmitriy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Toyota Technological Institute / Chicago
United States
Zip Code
Sundaram, Laksshman; Gao, Hong; Padigepati, Samskruthi Reddy et al. (2018) Predicting the clinical impact of human mutation with deep neural networks. Nat Genet 50:1161-1170
Ching, Travers; Himmelstein, Daniel S; Beaulieu-Jones, Brett K et al. (2018) Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 15:
Gao, Yujuan; Wang, Sheng; Deng, Minghua et al. (2018) RaptorX-Angle: real-value prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning. BMC Bioinformatics 19:100
Wang, Sheng; Sun, Siqi; Xu, Jinbo (2018) Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins 86 Suppl 1:67-77
Zhu, Jianwei; Wang, Sheng; Bu, Dongbo et al. (2018) Protein threading using residue co-variation and deep learning. Bioinformatics 34:i263-i273
Wang, Sheng; Fei, Shiyang; Wang, Zongan et al. (2018) PredMP: a web server for de novo prediction and visualization of membrane proteins. Bioinformatics :
Zeng, Hong; Wang, Sheng; Zhou, Tianming et al. (2018) ComplexContact: a web server for inter-protein contact prediction using deep learning. Nucleic Acids Res 46:W432-W437
Shao, Mingfu; Ma, Jianzhu; Wang, Sheng (2017) DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields. Bioinformatics 33:i267-i273
Wang, Sheng; Sun, Siqi; Li, Zhen et al. (2017) Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput Biol 13:e1005324
Wozniak, P P; Konopka, B M; Xu, J et al. (2017) Forecasting residue-residue contact prediction accuracy. Bioinformatics 33:3405-3414

Showing the most recent 10 out of 46 publications