Development of MUFOLD for Building High-Accuracy Protein Structure Models

Xu, Dong

Abstract

The long-term objective of the proposed project is to provide a comprehensive platform, MUFOLD, for efficient and consistently accurate protein tertiary structure prediction. MUFOLD will help experimental biologists understand structures and functions of the proteins of their interest thereby facilitating hypotheses for experimental design. We will focus on the Funding Opportunity Announcement's second objective -- High- Accuracy Models for Remote Homologs of Known Structures which states the quality of these models should be close to X-ray structures or high-resolution NMR structures with less than 2 Angstrom RMSD for backbone and side-chain atoms consistently for all protein targets. Specifically, we will integrate bioinformatics techniques, graph and network theories, computational algorithms, global optimization methods, statistics evaluations, etc. to develop a template-based structure prediction system, in which model generation, model quality assessment (QA), and model refinement will be seamlessly integrated together. At first, we will apply relevant information from the known template database (PDB) in depth as well as multi-layer QA methods to guide an efficient model generation in a small and targeted conformation space, which will facilitate computational efficiency and a limited number of models for QA methods to select. Secondly, we will improve the overall discerning power of QA by integrating various QA scores of a model and its structural relationships to other models generated for the same target protein. Thirdly, we will develop a population-based model refinement protocol, which integrates different levels of QA and efficient model generation techniques to improve the overall quality of models. Our goals are 1) to improve the prediction speed such that the prediction for a target protein with 200~300 residues can be finished in minutes on a multi-core desktop machine; 2) to enhance the QA ability of selecting the best models from the generated candidates, and decrease the current average ~10-point GDT-TS loss from the best available model to <5 points; 3) to achieve the prediction accuracy for remote homolog proteins within 2 Angstrom RMSD for backbone and side-chain atoms on average; and 4) to collaborate with PSI (Protein Structure Initiative) and others for various applications, such as performing homolog modeling for proteins with sequence similarity to newly determined structures, building complete models for incomplete structures, and predicting potential mutation sites to make protein soluble.

Public Health Relevance

Protein structure prediction can provide valuable information for understanding disease mechanisms and designing drugs. Current computational methods are still far from consistently providing accurate structures. With rapid accumulating protein sequences derived from next-generation sequencing, software tools that can significantly improve the accuracy and efficiency of protein structure prediction are urgently needed, and our proposed development will address this need by developing a set of integrated novel methodologies.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 4R01GM100701-05
Application #: 9086384
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Smith, Ward

Project Start: 2012-07-01
Project End: 2017-04-30
Budget Start: 2016-05-01
Budget End: 2017-04-30
Support Year: 5
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: University of Missouri-Columbia
Department: Biostatistics & Other Math Sci
Type: Biomed Engr/Col Engr/Engr Sta
DUNS #: 153890272

City: Columbia
State: MO
Country: United States
Zip Code: 65211

Related projects


NIH 2016 R01 GM	Development of MUFOLD for Building High-Accuracy Protein Structure Models Xu, Dong / University of Missouri-Columbia
NIH 2015 R01 GM	Development of MUFOLD for Building High-Accuracy Protein Structure Models Xu, Dong / University of Missouri-Columbia	$278,628
NIH 2014 R01 GM	Development of MUFOLD for Building High-Accuracy Protein Structure Models Xu, Dong / University of Missouri-Columbia
NIH 2013 R01 GM	Development of MUFOLD for Building High-Accuracy Protein Structure Models Xu, Dong / University of Missouri-Columbia	$269,387
NIH 2012 R01 GM	Development of MUFOLD for Building High-Accuracy Protein Structure Models Xu, Dong / University of Missouri-Columbia	$279,411

Publications

Wang, Juexin; Sheridan, Robert; Sumer, S Onur et al. (2018) G2S: a web-service for annotating genomic variants on 3D protein structures. Bioinformatics 34:1949-1950

Fang, Chao; Shang, Yi; Xu, Dong (2018) MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction. Proteins 86:592-598

Rao, R Shyama Prasad; Zhang, Ning; Xu, Dong et al. (2018) CarbonylDB: a curated data-resource of protein carbonylation sites. Bioinformatics 34:2518-2520

Zhang, Ning; Rao, R S P; Salvato, Fernanda et al. (2018) MU-LOC: A Machine-Learning Method for Predicting Mitochondrially Localized Proteins in Plants. Front Plant Sci 9:634

Keasar, Chen; McGuffin, Liam J; Wallner, Björn et al. (2018) An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12. Sci Rep 8:9939

Liu, Ye; Yu, Zhengfei; Zhu, Jingxuan et al. (2018) Why Is a High Temperature Needed by Thermus thermophilus Argonaute During mRNA Silencing: A Theoretical Study. Front Chem 6:223

Wang, Duolin; Wang, Juexin; Jiang, Yuexu et al. (2017) BFDCA: A Comprehensive Tool of Using Bayes Factor for Differential Co-Expression Analysis. J Mol Biol 429:446-453

Wang, Duolin; Zeng, Shuai; Xu, Chunhui et al. (2017) MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics 33:3909-3916

Zhu, Jingxuan; Lv, Yishuo; Han, Xiaosong et al. (2017) Understanding the differences of the ligand binding/unbinding pathways between phosphorylated and non-phosphorylated ARH1 using molecular dynamics simulations. Sci Rep 7:12439

Zhang, Li; Wang, Han; Yan, Lun et al. (2017) OMPcontact: An Outer Membrane Protein Inter-Barrel Residue Contact Prediction Method. J Comput Biol 24:217-228

Showing the most recent 10 out of 37 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: