Predicting the three-dimensional structures of proteins without using known structures from the Protein Data Bank (PDB) as templates (ab initio) remains a grand challenge of computational biology. Whereas template-based modeling is now a mature field, ab initio modeling is a comparatively nascent one, especially for large proteins with complex topologies and multiple domains. The need for advances in ab initio modeling is evident. A lot of protein sequences do not have (recognizable) templates in the PDB, and the pace of experimental structure determination is incommensurate with the scale of the problem. Herein, we propose a new approach to ab initio modeling that consists of novel deep learning architectures to predict inter- residue distances and domain boundaries as well as robust, iterative optimization methods to construct tertiary structures from the predicted distances. This project builds on the success of our current R01, particularly the outstanding performance of the Cheng group in the 2018 worldwide protein structure prediction experiment ? CASP13 ? where our MULTICOM suite ranked among the top three tertiary structure predictors, alongside Google DeepMind?s AlphaFold. The methods will be implemented as open-source tools for the emerging field of distance-based ab initio protein structure modeling. We will apply the methods to study protein homo-oligomers and self-assemblies, based on our novel discovery that the quaternary structure contacts within homo-oligomers can be predicted by deep learning methods from the co-evolutionary signals embedded in multiple sequence alignments of protein monomers. Furthermore, we will apply the methods to predict the folds, functional sites, superfamilies, and protein-protein interactions of proteins that contain ?essential Domains of Unknown Function? (eDUFs), a group of evolutionarily conserved, essential proteins that represents an important uncharted region of protein function/fold space. The predictions for a diverse and representative subset of eDUFs will be experimentally validated through a unique collaboration with the structural biology group of Dr. Tanner.

Public Health Relevance

Three-dimensional protein structure information is indispensable in modern biomedical research, but experimental techniques will only resolve a small fraction of known proteins due to the considerable cost. This project will develop cutting-edge computational methods based on modern artificial intelligence (AI) technology to reliably predict protein structures from sequence information alone. The prediction tools will be applied through collaborations with experimental scientists and disseminated to the community.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
2R01GM093123-09A1
Application #
10051137
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Lyster, Peter
Project Start
2010-06-01
Project End
2024-05-31
Budget Start
2020-09-05
Budget End
2021-05-31
Support Year
9
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Missouri-Columbia
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
153890272
City
Columbia
State
MO
Country
United States
Zip Code
65211
Korasick, David A; White, Tommi A; Chakravarthy, Srinivas et al. (2018) NAD+ promotes assembly of the active tetramer of aldehyde dehydrogenase 7A1. FEBS Lett 592:3229-3238
Adhikari, Badri; Hou, Jie; Cheng, Jianlin (2018) DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics 34:1466-1472
Hou, Jie; Adhikari, Badri; Cheng, Jianlin (2018) DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics 34:1295-1303
Liu, Li-Kai; Tanner, John J (2018) Crystal Structure of Aldehyde Dehydrogenase 16 Reveals Trans-Hierarchical Structural Similarity and a New Dimer. J Mol Biol :
Adhikari, Badri; Cheng, Jianlin (2018) CONFOLD2: improved contact-driven ab initio protein structure modeling. BMC Bioinformatics 19:22
Korasick, David A; Kon?itíková, Radka; Kope?ná, Martina et al. (2018) Structural and Biochemical Characterization of Aldehyde Dehydrogenase 12, the Last Enzyme of Proline Catabolism in Plants. J Mol Biol :
Adhikari, Badri; Hou, Jie; Cheng, Jianlin (2018) Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning. Proteins 86 Suppl 1:84-96
Keasar, Chen; McGuffin, Liam J; Wallner, Björn et al. (2018) An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12. Sci Rep 8:9939
Korasick, David A; Wyatt, Jesse W; Luo, Min et al. (2017) Importance of the C-Terminus of Aldehyde Dehydrogenase 7A1 for Oligomerization and Catalytic Activity. Biochemistry 56:5910-5919
Cao, Renzhi; Adhikari, Badri; Bhattacharya, Debswapna et al. (2017) QAcon: single model quality assessment using protein structural and contact information with machine learning techniques. Bioinformatics 33:586-588

Showing the most recent 10 out of 77 publications