Algorithmic assignment of probable function to proteins of previously unknown function Objectives and Specific Aims: The goal of this project is to extend and apply algorithms that show promise in assigning a probable function for PDB entries of currently unknown function. This should contribute to deriving benefit from the Protein Structure Initiative by "help[ing] researchers illuminate structure-function relationships and thus formulate better hypotheses and design better experiments." Research Design and Methods: New protein structures are being determined at a rate faster than their biological function can be assigned. There are currently 2939 entries in the Protein Data Bank with the classification "Unknown Function". A number of computational methods have been developed to provide rapid, inexpensive means of function prediction for these structures, including those that focus on alignment of entire backbones and others that focus on identification and alignment of active site residues based on the unusual charge distributions in protein structures. We have developed a software plug-in for the PyMOL molecular graphics environment called ProMOL that relies on the geometric relationships conserved in enzyme catalytic sites. Motifs in ProMOL were created from the active site specifications found in the Catalytic Site Atlas (CSA) ( Our approach explicitly searches for CSA- defined catalytic site residues according to specific atomic geometry, similar in concept to the CSA JESS templates. This dispenses with the need to filter out confounding elements such as conserved folding domains or ligand binding regions. Extensive testing of structural files from the serine protease and peroxidase families confirmed that the geometric relationships of catalytic residues alone are effective and sufficient for function prediction in protein structures. In addition to extensive characterization of serine proteases and peroxidases, we also performed a preliminary study of 39 PDB entries classified as "Structural Genomics, Unknown Function" using the Motif Finder in ProMOL, which contains 22 "native" ProMOL motifs, along with the corresponding CSA JESS C1C2 motifs and CSA Functional Atom motifs. Of the 39 entries studied, 26 (67%) yielded prediction values of 1 (exact match to an existing template). An active site lacking one residue or containing an extra (outlier) residue was identified for 36 (92%) of the structures. No match was reported in only three of the test cases. We will extend the number of motifs in ProMOL's Motif Finder, using both newly created ProMOL motifs and existing JESS motifs to include representatives from the most prominent protein families, increase automation of the process and then evaluate all PDB entries described as having "unknown function". Entries that show positive correlation will then be further explored using sequence and structure alignment tools. Both software and results will be openly released to the community.

Public Health Relevance

Algorithmic assignment of probable function to proteins of previously unknown function Relevance: One expected benefit of the Protein Structure Initiative (PSI) is that structural descriptions will help researchers illuminate structure-function relationships and thus formulate better hypotheses and design better experiments;however, even after a three dimensional structure of a protein has been obtained the function or functions of that protein are not always apparent. Algorithms that compare salient structural features of proteins of known function to similar features in PSI targets for which the function is not yet known can provide helpful guidance in assigning probable functions to those targets and the aim of this project is to use such algorithms to assign probable functions to a significant subset of the PSI targets of unknown function and thereby help in better understanding structure-function relationships.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Academic Research Enhancement Awards (AREA) (R15)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Swain, Amy L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Dowling College
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Andrews, Lawrence C; Bernstein, Herbert J (2016) NearTree, a data structure and a software toolkit for the nearest-neighbor problem. J Appl Crystallogr 49:756-761
Osipovitch, Mikhail; Lambrecht, Mitchell; Baker, Cameron et al. (2015) Automated protein motif generation in the structure-based protein function prediction tool ProMOL. J Struct Funct Genomics 16:101-11
McKay, Talia; Hart, Kaitlin; Horn, Alison et al. (2015) Annotation of proteins of unknown function: initial enzyme results. J Struct Funct Genomics 16:43-54
Hanson, Brett; Westin, Charles; Rosa, Mario et al. (2014) Estimation of protein function using template-based alignment of enzyme active sites. BMC Bioinformatics 15:87
McGill, Keith J; Asadi, Mojgan; Karakasheva, Maria T et al. (2014) The geometry of Niggli reduction: SAUC - search of alternative unit cells. J Appl Crystallogr 47:360-364
Andrews, Lawrence C; Bernstein, Herbert J (2014) The geometry of Niggli reduction: BGAOL -embedding Niggli reduction and analysis of boundaries. J Appl Crystallogr 47:346-359
Craig, Paul A; Michel, Lea Vacca; Bateman, Robert C (2013) A survey of educational uses of molecular visualization freeware. Biochem Mol Biol Educ 41:193-205
Bernstein, Herbert J; Craig, Paul A (2010) Efficient molecular surface rendering by linear-time pseudo-Gaussian approximation to Lee-Richards surfaces (PGALRS). J Appl Crystallogr 43:356-361
Mottarella, Scott E; Rosa, Mario; Bangura, Abdul et al. (2010) Conscript: RasMol to PyMOL script converter. Biochem Mol Biol Educ 38:419-22