Algorithmic assignment of probable function to proteins of previously unknown fun

Bernstein, Herbert; Craig, Paul

Abstract

Algorithmic assignment of probable function to proteins of previously unknown function Objectives and Specific Aims: The goal of this project is to extend and apply algorithms that show promise in assigning a probable function for PDB entries of currently unknown function. This should contribute to deriving benefit from the Protein Structure Initiative by """"""""help[ing] researchers illuminate structure-function relationships and thus formulate better hypotheses and design better experiments."""""""" Research Design and Methods: New protein structures are being determined at a rate faster than their biological function can be assigned. There are currently 2939 entries in the Protein Data Bank with the classification """"""""Unknown Function"""""""". A number of computational methods have been developed to provide rapid, inexpensive means of function prediction for these structures, including those that focus on alignment of entire backbones and others that focus on identification and alignment of active site residues based on the unusual charge distributions in protein structures. We have developed a software plug-in for the PyMOL molecular graphics environment called ProMOL that relies on the geometric relationships conserved in enzyme catalytic sites. Motifs in ProMOL were created from the active site specifications found in the Catalytic Site Atlas (CSA) (www.ebi.ac.uk/thornton-srv/databases/CSA/). Our approach explicitly searches for CSA- defined catalytic site residues according to specific atomic geometry, similar in concept to the CSA JESS templates. This dispenses with the need to filter out confounding elements such as conserved folding domains or ligand binding regions. Extensive testing of structural files from the serine protease and peroxidase families confirmed that the geometric relationships of catalytic residues alone are effective and sufficient for function prediction in protein structures. In addition to extensive characterization of serine proteases and peroxidases, we also performed a preliminary study of 39 PDB entries classified as """"""""Structural Genomics, Unknown Function"""""""" using the Motif Finder in ProMOL, which contains 22 """"""""native"""""""" ProMOL motifs, along with the corresponding CSA JESS C1C2 motifs and CSA Functional Atom motifs. Of the 39 entries studied, 26 (67%) yielded prediction values of 1 (exact match to an existing template). An active site lacking one residue or containing an extra (outlier) residue was identified for 36 (92%) of the structures. No match was reported in only three of the test cases. We will extend the number of motifs in ProMOL's Motif Finder, using both newly created ProMOL motifs and existing JESS motifs to include representatives from the most prominent protein families, increase automation of the process and then evaluate all PDB entries described as having """"""""unknown function"""""""". Entries that show positive correlation will then be further explored using sequence and structure alignment tools. Both software and results will be openly released to the community.

Public Health Relevance

Algorithmic assignment of probable function to proteins of previously unknown function Relevance: One expected benefit of the Protein Structure Initiative (PSI) is that structural descriptions will help researchers illuminate structure-function relationships and thus formulate better hypotheses and design better experiments;however, even after a three dimensional structure of a protein has been obtained the function or functions of that protein are not always apparent. Algorithms that compare salient structural features of proteins of known function to similar features in PSI targets for which the function is not yet known can provide helpful guidance in assigning probable functions to those targets and the aim of this project is to use such algorithms to assign probable functions to a significant subset of the PSI targets of unknown function and thereby help in better understanding structure-function relationships.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Academic Research Enhancement Awards (AREA) (R15)
Project #: 3R15GM078077-02S2
Application #: 8775451
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Swain, Amy L

Project Start: 2006-08-01
Project End: 2014-08-31
Budget Start: 2011-09-01
Budget End: 2014-08-31
Support Year: 2
Fiscal Year: 2014
Total Cost: $26,877
Indirect Cost

Institution

Name: Dowling College
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 064724917

City: Oakdale
State: NY
Country: United States
Zip Code: 11769

Related projects


NIH 2014 R15 GM	Algorithmic assignment of probable function to proteins of previously unknown fun Bernstein, Herbert J.; Craig, Paul A. / Dowling College	$26,877
NIH 2014 R15 GM	Algorithmic assignment of probable function to proteins of previously unknown fun Bernstein, Herbert J.; Craig, Paul A. / Dowling College	$15,751
NIH 2012 R15 GM	Algorithmic assignment of probable function to proteins of previously unknown fun Bernstein, Herbert J.; Craig, Paul A. / Dowling College	$107,653
NIH 2011 R15 GM	Algorithmic assignment of probable function to proteins of previously unknown fun Bernstein, Herbert J.; Craig, Paul A. / Dowling College	$437,100
NIH 2009 R15 GM	SBEVSL -- Structural Biology Extensible Visualization Scripting Language Bernstein, Herbert J. / Dowling College	$50,580
NIH 2006 R15 GM	SBEVSL -- Structural Biology Extensible Visualization Scripting Language Bernstein, Herbert J. / Dowling College	$216,750

Publications

Craig, Paul A (2018) Lessons from my undergraduate research students. J Biol Chem 293:10447-10452

Andrews, Lawrence C; Bernstein, Herbert J (2016) NearTree, a data structure and a software toolkit for the nearest-neighbor problem. J Appl Crystallogr 49:756-761

McKay, Talia; Hart, Kaitlin; Horn, Alison et al. (2015) Annotation of proteins of unknown function: initial enzyme results. J Struct Funct Genomics 16:43-54

Osipovitch, Mikhail; Lambrecht, Mitchell; Baker, Cameron et al. (2015) Automated protein motif generation in the structure-based protein function prediction tool ProMOL. J Struct Funct Genomics 16:101-11

Hanson, Brett; Westin, Charles; Rosa, Mario et al. (2014) Estimation of protein function using template-based alignment of enzyme active sites. BMC Bioinformatics 15:87

McGill, Keith J; Asadi, Mojgan; Karakasheva, Maria T et al. (2014) The geometry of Niggli reduction: SAUC - search of alternative unit cells. J Appl Crystallogr 47:360-364

Andrews, Lawrence C; Bernstein, Herbert J (2014) The geometry of Niggli reduction: BGAOL -embedding Niggli reduction and analysis of boundaries. J Appl Crystallogr 47:346-359

Craig, Paul A; Michel, Lea Vacca; Bateman, Robert C (2013) A survey of educational uses of molecular visualization freeware. Biochem Mol Biol Educ 41:193-205

Bernstein, Herbert J; Craig, Paul A (2010) Efficient molecular surface rendering by linear-time pseudo-Gaussian approximation to Lee-Richards surfaces (PGALRS). J Appl Crystallogr 43:356-361

Mottarella, Scott E; Rosa, Mario; Bangura, Abdul et al. (2010) Conscript: RasMol to PyMOL script converter. Biochem Mol Biol Educ 38:419-22

Showing the most recent 10 out of 11 publications

Comments

Be the first to comment on Herbert Bernstein's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: