Proteins play a central role in all biological processes. Akin to the complete sequencing of genomes, complete description of protein structures is a fundamental step towards understanding biological life, and is also highly relevant medically in the development of therapeutics and drugs. The broad, long-term goal of the project is to develop machine learning methods for data-driven protein structure prediction through two independent but complementary strategies: 1) much more accurate template-based modeling for proteins with remote homologs in the Protein Data Bank and 2) better template-free modeling method for proteins without detectable templates and for improving template-based models.
The specific aims are:
Aim 1) to greatly improve template-based modeling by 1a) improving protein sequence-template alignment using a regression-tree-based nonlinear scoring function, especially when good sequence profiles are unavailable;and 1b) improving fold recognition using a machine learning method to combine both residue-level and atom-level features;
Aim 2) to improve protein conformation sampling in a continuous space and thus template-free modeling by three independent but complementary approaches: 2a) modeling nonlinear sequence- structure relationship using Conditional (Markov) Random Fields (CRF) models;2b) simultaneously sampling secondary and tertiary structure;and 2c) learning structure information from template. The core of the project is to develop various CRF models for data-driven protein structure prediction, by learning protein sequence-structure relationship from existing sequence/structure databases. The product of this research includes a regression-tree-based CRF model for accurate protein alignment, especially for proteins without close homologs in the PDB or without very good sequence profiles;a SVM model for protein fold recognition;a few CRF models for efficient protein conformation sampling in a continuous space;and a complete protein structure prediction software package. Also, it will produce a web server publicly available for various academic and biomedical users. Protein structure prediction will lead to a broad range of biomedical applications, such as the development of novel diagnostics, better understanding of disease processes and improved preventive therapies leading to reduced health care costs. Protein modeling is also widely applied in the pharmaceutical industry and integrated into most stages of pharmaceutical research.

Public Health Relevance

Novel protein structure prediction will lead to a broad range of biomedical applications, such as the development of novel diagnostics, better understanding of disease processes and improved preventive therapies leading to reduced health care costs. Protein modeling is also widely applied in the pharmaceutical industry and integrated into most stages of pharmaceutical research.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM089753-04
Application #
8463561
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
2010-05-14
Project End
2015-04-30
Budget Start
2013-05-01
Budget End
2014-04-30
Support Year
4
Fiscal Year
2013
Total Cost
$256,563
Indirect Cost
$70,270
Name
Toyota Technological Institute / Chicago
Department
Type
DUNS #
127228927
City
Chicago
State
IL
Country
United States
Zip Code
60637
Wang, Sheng; Peng, Jian; Ma, Jianzhu et al. (2016) Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Sci Rep 6:18962
Wang, Sheng; Li, Wei; Zhang, Renyu et al. (2016) CoinFold: a web server for protein contact prediction and contact-assisted protein folding. Nucleic Acids Res 44:W361-6
Wang, Sheng; Li, Wei; Liu, Shiwang et al. (2016) RaptorX-Property: a web server for protein structure property prediction. Nucleic Acids Res 44:W430-5
Sulakhe, Dinanath; Xie, Bingqing; Taylor, Andrew et al. (2016) Lynx: a knowledge base and an analytical workbench for integrative medicine. Nucleic Acids Res 44:D882-7
Ma, Jianzhu; Wang, Sheng; Wang, Zhiyong et al. (2015) Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning. Bioinformatics 31:3506-13
Wang, Siyu; Xu, Jinbo; Zeng, Jianyang (2015) Inferential modeling of 3D chromatin structure. Nucleic Acids Res 43:e54
Xie, Bingqing; Agam, Gady; Balasubramanian, Sandhya et al. (2015) Disease gene prioritization using network and feature. J Comput Biol 22:313-23
Sulakhe, Dinanath; Balasubramanian, Sandhya; Xie, Bingqing et al. (2014) Lynx: a database and knowledge extraction engine for integrative medicine. Nucleic Acids Res 42:D1007-12
Yang, Fan; Xu, Jinbo; Zeng, Jianyang (2014) Drug-target interaction prediction by integrating chemical, genomic, functional and pharmacological data. Pac Symp Biocomput :148-59
Dubchak, Inna; Balasubramanian, Sandhya; Wang, Sheng et al. (2014) An integrative computational approach for prioritization of genomic variants. PLoS One 9:e114903

Showing the most recent 10 out of 32 publications