The long-term objective of the proposed project is to provide a comprehensive platform, MUFOLD, for efficient and consistently accurate protein tertiary structure prediction. MUFOLD will help experimental biologists understand structures and functions of the proteins of their interest thereby facilitating hypotheses for experimental design. We will focus on the Funding Opportunity Announcement's second objective -- "High- Accuracy Models for Remote Homologs of Known Structures" which states "the quality of these models should be close to X-ray structures or high-resolution NMR structures with less than 2 Angstrom RMSD for backbone and side-chain atoms consistently for all protein targets." Specifically, we will integrate bioinformatics techniques, graph and network theories, computational algorithms, global optimization methods, statistics evaluations, etc. to develop a template-based structure prediction system, in which model generation, model quality assessment (QA), and model refinement will be seamlessly integrated together. At first, we will apply relevant information from the known template database (PDB) in depth as well as multi-layer QA methods to guide an efficient model generation in a small and targeted conformation space, which will facilitate computational efficiency and a limited number of models for QA methods to select. Secondly, we will improve the overall discerning power of QA by integrating various QA scores of a model and its structural relationships to other models generated for the same target protein. Thirdly, we will develop a population-based model refinement protocol, which integrates different levels of QA and efficient model generation techniques to improve the overall quality of models. Our goals are 1) to improve the prediction speed such that the prediction for a target protein with 200~300 residues can be finished in minutes on a multi-core desktop machine;2) to enhance the QA ability of selecting the best models from the generated candidates, and decrease the current average ~10-point GDT-TS loss from the best available model to <5 points;3) to achieve the prediction accuracy for remote homolog proteins within 2 Angstrom RMSD for backbone and side-chain atoms on average;and 4) to collaborate with PSI (Protein Structure Initiative) and others for various applications, such as performing homolog modeling for proteins with sequence similarity to newly determined structures, building complete models for incomplete structures, and predicting potential mutation sites to make protein soluble.

Public Health Relevance

Protein structure prediction can provide valuable information for understanding disease mechanisms and designing drugs. Current computational methods are still far from consistently providing accurate structures. With rapid accumulating protein sequences derived from next-generation sequencing, software tools that can significantly improve the accuracy and efficiency of protein structure prediction are urgently needed, and our proposed development will address this need by developing a set of integrated novel methodologies.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM100701-02
Application #
8469528
Study Section
Special Emphasis Panel (ZRG1-BCMB-S (02))
Program Officer
Smith, Ward
Project Start
2012-07-01
Project End
2017-04-30
Budget Start
2013-05-01
Budget End
2014-04-30
Support Year
2
Fiscal Year
2013
Total Cost
$269,387
Indirect Cost
$86,037
Name
University of Missouri-Columbia
Department
Biostatistics & Other Math Sci
Type
Schools of Engineering
DUNS #
153890272
City
Columbia
State
MO
Country
United States
Zip Code
65211
Wang, Duolin; Wang, Juexin; Jiang, Yuexu et al. (2016) BFDCA: A Comprehensive Tool of Using Bayes Factor for Differential Co-expression Analysis. J Mol Biol :
Nguyen, Cuong The; Tanaka, Kiwamu; Cao, Yangrong et al. (2016) Computational Analysis of the Ligand Binding Site of the Extracellular ATP Receptor, DORN1. PLoS One 11:e0161894
Wang, Chao; Zhang, Haicang; Zheng, Wei-Mou et al. (2016) FALCON@home: a high-throughput protein structure prediction server based on remote homologue recognition. Bioinformatics 32:462-4
Wang, Juexin; Shen, Dingding; Xia, Geqing et al. (2016) Differential protein structural disturbances and suppression of assembly partners produced by nonsense GABRG2 epilepsy mutations: implications for disease phenotypic heterogeneity. Sci Rep 6:35294
Wang, Juexin; Joshi, Trupti; Valliyodan, Babu et al. (2015) A Bayesian model for detection of high-order interactions among genetic variants in genome-wide association studies. BMC Genomics 16:1011
He, Zhiquan; Ma, Wenji; Zhang, Jingfen et al. (2015) A New Hidden Markov Model for Protein Quality Assessment Using Compatibility Between Protein Sequence and Structure. Tsinghua Sci Technol 19:559-567
Kang, Jing-Qiong; Shen, Wangzhen; Zhou, Chengwen et al. (2015) The human epilepsy mutation GABRG2(Q390X) causes chronic subunit accumulation and neurodegeneration. Nat Neurosci 18:988-96
Lu, Sha; Yin, Xiaoyan; Spollen, William et al. (2015) Analysis of the siRNA-Mediated Gene Silencing Process Targeting Three Homologous Genes Controlling Soybean Seed Oil Quality. PLoS One 10:e0129010
Zhang, Jiong; Barz, Bogdan; Zhang, Jingfen et al. (2015) Selective refinement and selection of near-native models in protein structure prediction. Proteins 83:1823-35
Yao, Qiuming; Ge, Huangyi; Wu, Shangquan et al. (2014) P³DB 3.0: From plant phosphorylation sites to protein networks. Nucleic Acids Res 42:D1206-13

Showing the most recent 10 out of 25 publications