The long-term goal of this project is to develop a structure-based approach for the prediction of protein molecular function so that the information provided by both genome sequencing and structural genomics can be more fully exploited. To achieve this overall objective, this proposal further develops a very promising and tightly integrated, sequence-to-structure-to-function approach that employs protein structure to predict protein- protein interactions, protein molecular function, and ligand binding sites. It also holds considerable promise for improved ligand screening. In particular, the following Specific Aims are proposed: (1) Monomeric sequence profile-based threading algorithms, which currently fail to find the good template structures in the PDB for the ~25% of single domain proteins with very low sequence identity to solved protein structures, will be extended and improved. (2) A purely structure-based version of threading will be developed, as the best contemporary threading algorithms have a strong evolutionary component that limits their structure recognition ability when the target and template proteins are evolutionarily distant or have analogous structures. In that regard, potentials of mean force suitable for structure-based threading will be derived from a new AMBER-related, physics-based atomic potential that shows significant ability to refine structures closer to native. (3) The multimeric structure prediction algorithm, m-TASSER, will be enhanced by improving the accuracy of interfacial side chain contact predictions and the use of physics-based interfacial potentials for structure refinement. In addition, by exploiting the fact that the library of single domain protein structures is likely complete, all-against-all docking will provide an estimate of the number of possible dimer complexes of single domain proteins. (4) The FINDSITE structure-based protein molecular function prediction algorithm will be extended and improved. Included are enhancements of its ligand screening ability based on the insight that for evolutionarily distant proteins, there are conserved anchor regions in both the protein binding site and in the 2 bound ligands that can be exploited for rapid ligand binding pose prediction and screening. (5) EFICAz , a precise enzyme function inference approach, will be combined with FINDSITE to develop a more powerful ligand screening approach. (6) The entire set of tools developed in Aims 1-5 will be applied to all sequenced 2 proteomes and the resulting sequence-to-structure-to-function, S F, database made available to the academic 2 community. Whole proteome structure predictions will be combined with EFICAz and FINDSITE to identify possible receptors of small regulatory molecules including the targets of anticancer metabolites, and to provide whole proteome screened ligand libraries, libraries of protein-protein interactions, quaternary structures and molecular functional annotations. In all cases, large scale, careful benchmarking will be done. Thus, this project holds the promise of making a significant impact across a wide spectrum of biologically important problems.

Public Health Relevance

The development and whole proteome application of the tightly integrated, protein sequence-to-structure- function approach described in this project will be of utility to a broad spectrum of researchers. By assisting in the early stages of drug discovery, the proposed algorithms could have significant therapeutic utility. Also, most of the estimated 650,000 protein-protein interactions in the human interactome are unknown;by providing predicted protein quaternary structures, insights into how these proteins perform their function will result.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Wehrle, Janna P
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Georgia Institute of Technology
Schools of Arts and Sciences
United States
Zip Code
Skolnick, Jeffrey; Gao, Mu; Zhou, Hongyi (2014) On the role of physics and evolution in dictating protein structure and function. Isr J Chem 54:1176-1188
Khoury, George A; Liwo, Adam; Khatib, Firas et al. (2014) WeFold: a coopetition for protein structure prediction. Proteins 82:1850-68
Skolnick, Jeffrey; Zhou, Hongyi; Gao, Mu (2013) Are predicted protein structures of any value for binding site prediction and virtual ligand screening? Curr Opin Struct Biol 23:191-7
Skolnick, Jeffrey; Gao, Mu (2013) Interplay of physics and evolution in the likely origin of protein biochemical function. Proc Natl Acad Sci U S A 110:9344-9
Zhou, Hongyi; Skolnick, Jeffrey (2013) FINDSITE(comb): a threading/structure-based, proteomic-scale virtual ligand screening approach. J Chem Inf Model 53:230-40
Jo, Sunhwan; Lee, Hui Sun; Skolnick, Jeffrey et al. (2013) Restricted N-glycan conformational space in the PDB and its implication in glycan structure modeling. PLoS Comput Biol 9:e1002946
Gao, Mu; Skolnick, Jeffrey (2013) A comprehensive survey of small-molecule binding pockets in proteins. PLoS Comput Biol 9:e1003302
Gao, Mu; Skolnick, Jeffrey (2013) APoc: large-scale identification of similar protein pockets. Bioinformatics 29:597-604
Gao, Mu; Skolnick, Jeffrey (2012) The distribution of ligand-binding pockets around protein-protein interfaces suggests a general mechanism for pocket formation. Proc Natl Acad Sci U S A 109:3784-9
Gao, Mu; Skolnick, Jeffrey (2011) New benchmark metrics for protein-protein docking methods. Proteins 79:1623-34

Showing the most recent 10 out of 94 publications