New Computational Methods for Data-driven Protein Structure Prediction

Xu, Jinbo

Abstract

Proteins play a central role in all biological processes. Akin to the complete sequencing of genomes, complete description of protein structures is a fundamental step towards understanding biological life, and is also highly relevant medically in the development of therapeutics and drugs. The broad, long-term goal of the project is to develop machine learning methods for data-driven protein structure prediction through two independent but complementary strategies: 1) much more accurate template-based modeling for proteins with remote homologs in the Protein Data Bank and 2) better template-free modeling method for proteins without detectable templates and for improving template-based models.
The specific aims are:
Aim 1) to greatly improve template-based modeling by 1a) improving protein sequence-template alignment using a regression-tree-based nonlinear scoring function, especially when good sequence profiles are unavailable;and 1b) improving fold recognition using a machine learning method to combine both residue-level and atom-level features;
Aim 2) to improve protein conformation sampling in a continuous space and thus template-free modeling by three independent but complementary approaches: 2a) modeling nonlinear sequence- structure relationship using Conditional (Markov) Random Fields (CRF) models;2b) simultaneously sampling secondary and tertiary structure;and 2c) learning structure information from template. The core of the project is to develop various CRF models for data-driven protein structure prediction, by learning protein sequence-structure relationship from existing sequence/structure databases. The product of this research includes a regression-tree-based CRF model for accurate protein alignment, especially for proteins without close homologs in the PDB or without very good sequence profiles;a SVM model for protein fold recognition;a few CRF models for efficient protein conformation sampling in a continuous space;and a complete protein structure prediction software package. Also, it will produce a web server publicly available for various academic and biomedical users. Protein structure prediction will lead to a broad range of biomedical applications, such as the development of novel diagnostics, better understanding of disease processes and improved preventive therapies leading to reduced health care costs. Protein modeling is also widely applied in the pharmaceutical industry and integrated into most stages of pharmaceutical research.

Public Health Relevance

Novel protein structure prediction will lead to a broad range of biomedical applications, such as the development of novel diagnostics, better understanding of disease processes and improved preventive therapies leading to reduced health care costs. Protein modeling is also widely applied in the pharmaceutical industry and integrated into most stages of pharmaceutical research.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 1R01GM089753-01
Application #: 7764110
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Lyster, Peter

Project Start: 2010-05-14
Project End: 2015-04-30
Budget Start: 2010-05-14
Budget End: 2011-04-30
Support Year: 1
Fiscal Year: 2010
Total Cost: $268,555
Indirect Cost

Institution

Name: Toyota Technological Institute / Chicago
Department
Type
DUNS #: 127228927

City: Chicago
State: IL
Country: United States
Zip Code: 60637

Related projects

Publications

Wang, Sheng; Fei, Shiyang; Wang, Zongan et al. (2018) PredMP: a web server for de novo prediction and visualization of membrane proteins. Bioinformatics :

Zeng, Hong; Wang, Sheng; Zhou, Tianming et al. (2018) ComplexContact: a web server for inter-protein contact prediction using deep learning. Nucleic Acids Res 46:W432-W437

Sundaram, Laksshman; Gao, Hong; Padigepati, Samskruthi Reddy et al. (2018) Predicting the clinical impact of human mutation with deep neural networks. Nat Genet 50:1161-1170

Ching, Travers; Himmelstein, Daniel S; Beaulieu-Jones, Brett K et al. (2018) Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface 15:

Gao, Yujuan; Wang, Sheng; Deng, Minghua et al. (2018) RaptorX-Angle: real-value prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning. BMC Bioinformatics 19:100

Wang, Sheng; Sun, Siqi; Xu, Jinbo (2018) Analysis of deep learning methods for blind protein contact prediction in CASP12. Proteins 86 Suppl 1:67-77

Zhu, Jianwei; Wang, Sheng; Bu, Dongbo et al. (2018) Protein threading using residue co-variation and deep learning. Bioinformatics 34:i263-i273

Shao, Mingfu; Ma, Jianzhu; Wang, Sheng (2017) DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields. Bioinformatics 33:i267-i273

Wang, Sheng; Sun, Siqi; Li, Zhen et al. (2017) Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLoS Comput Biol 13:e1005324

Wozniak, P P; Konopka, B M; Xu, J et al. (2017) Forecasting residue-residue contact prediction accuracy. Bioinformatics 33:3405-3414

Showing the most recent 10 out of 46 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: