Predicting the three-dimensional structures of proteins without using known structures from the Protein Data Bank (PDB) as templates (ab initio) remains a grand challenge of computational biology. Whereas template-based modeling is now a mature field, ab initio modeling is a comparatively nascent one, especially for large proteins with complex topologies and multiple domains. The need for advances in ab initio modeling is evident. A lot of protein sequences do not have (recognizable) templates in the PDB, and the pace of experimental structure determination is incommensurate with the scale of the problem. Herein, we propose a new approach to ab initio modeling that consists of novel deep learning architectures to predict inter- residue distances and domain boundaries as well as robust, iterative optimization methods to construct tertiary structures from the predicted distances. This project builds on the success of our current R01, particularly the outstanding performance of the Cheng group in the 2018 worldwide protein structure prediction experiment ? CASP13 ? where our MULTICOM suite ranked among the top three tertiary structure predictors, alongside Google DeepMind?s AlphaFold. The methods will be implemented as open-source tools for the emerging field of distance-based ab initio protein structure modeling. We will apply the methods to study protein homo-oligomers and self-assemblies, based on our novel discovery that the quaternary structure contacts within homo-oligomers can be predicted by deep learning methods from the co-evolutionary signals embedded in multiple sequence alignments of protein monomers. Furthermore, we will apply the methods to predict the folds, functional sites, superfamilies, and protein-protein interactions of proteins that contain ?essential Domains of Unknown Function? (eDUFs), a group of evolutionarily conserved, essential proteins that represents an important uncharted region of protein function/fold space. The predictions for a diverse and representative subset of eDUFs will be experimentally validated through a unique collaboration with the structural biology group of Dr. Tanner.
Three-dimensional protein structure information is indispensable in modern biomedical research, but experimental techniques will only resolve a small fraction of known proteins due to the considerable cost. This project will develop cutting-edge computational methods based on modern artificial intelligence (AI) technology to reliably predict protein structures from sequence information alone. The prediction tools will be applied through collaborations with experimental scientists and disseminated to the community.
Showing the most recent 10 out of 77 publications