New Computational Methods for Data-driven Protein Structure Prediction

Xu, Jinbo

Abstract

Proteins play fundamental roles in all biological processes. Accurate description of protein structure is an important step towards understanding of biological life and highly relevant in the development of therapeutics and drugs. Although experimental structure determination has been greatly improved, there is still a very large gap between the number of available protein sequences and that of solved protein structures, which can only be filled by computational prediction. The long-term goal of this project is to apply machine learning and optimization algorithms to understand protein sequence-structure-function relationship by analyzing sequence, structure and functional data and to develop data-driven computational methods and tools for structure and functional prediction. We believe that by developing sophisticated algorithms to extract knowledge from the increasing sequence and structure data, we can model protein sequence-structure relationship very accurately and improve structure and functional prediction greatly. This project has already produced a few CASP-winning, widely-used data- driven algorithms and web servers (http://raptorx.uchicago.edu) for protein structure modeling. This renewal will further develop machine learning (especially deep learning) algorithms for protein structure modeling without good templates.
The specific aims are: (1) developing deep learning (DL) algorithms for the prediction of protein contact and distance matrix; (2) developing distance-based algorithms for fast and accurate ab initio folding of proteins without templates; (3) developing DL algorithms for template-based modeling with only weakly similar templates. This renewal will lead to further understanding and new models of protein sequence-structure relationship and yield publicly available resources for automated, accurate, quantitative analysis for a wide range of proteins. The impact will be multiplied by tens of thousands of worldwide users employing our web servers to study a wide variety of proteins relevant to basic biological research and human diseases, in both low- and high-throughput experiments.

Public Health Relevance

Proteins and their interactions play fundamental roles in all biological processes including the maintenance of cellular integrity, metabolism, transcription/translation, and cell-cell communication. This proposal develops algorithms to understand protein sequence-structure relationship and to predict protein structures. The results will lead to a broad range of biomedical applications, such as better understanding of disease processes, development of novel diagnostics and drugs, and improved preventive therapies leading to reduced health care costs.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 2R01GM089753-10
Application #: 9817856
Study Section: Macromolecular Structure and Function D Study Section (MSFD)
Program Officer: Lyster, Peter

Project Start: 2010-05-14
Project End: 2024-08-31
Budget Start: 2020-09-01
Budget End: 2021-08-31
Support Year: 10
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Toyota Technological Institute / Chicago
Department
Type
DUNS #: 127228927

City: Chicago
State: IL
Country: United States
Zip Code: 60637

Related projects

Publications

Wang, Sheng; Fei, Shiyang; Wang, Zongan et al. (2018) PredMP: a web server for de novo prediction and visualization of membrane proteins. Bioinformatics :

Zeng, Hong; Wang, Sheng; Zhou, Tianming et al. (2018) ComplexContact: a web server for inter-protein contact prediction using deep learning. Nucleic Acids Res 46:W432-W437

Sundaram, Laksshman; Gao, Hong; Padigepati, Samskruthi Reddy et al. (2018) Predicting the clinical impact of human mutation with deep neural networks. Nat Genet 50:1161-1170

Ching, Travers; Himmelstein, Daniel S; Beaulieu-Jones, Brett K et al. (2018) Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 15:

Gao, Yujuan; Wang, Sheng; Deng, Minghua et al. (2018) RaptorX-Angle: real-value prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning. BMC Bioinformatics 19:100

Wang, Sheng; Sun, Siqi; Xu, Jinbo (2018) Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins 86 Suppl 1:67-77

Zhu, Jianwei; Wang, Sheng; Bu, Dongbo et al. (2018) Protein threading using residue co-variation and deep learning. Bioinformatics 34:i263-i273

Shao, Mingfu; Ma, Jianzhu; Wang, Sheng (2017) DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields. Bioinformatics 33:i267-i273

Wang, Sheng; Sun, Siqi; Li, Zhen et al. (2017) Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput Biol 13:e1005324

Wozniak, P P; Konopka, B M; Xu, J et al. (2017) Forecasting residue-residue contact prediction accuracy. Bioinformatics 33:3405-3414

Showing the most recent 10 out of 46 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: