Method Development: Efficient Computer Vision Based Algorithms

Nussinov, Ruth

Abstract

The uniqueness of our methodologies derives from viewing protein structures as collections of points (e.g., atom coordinates, or points describing molecular surfaces) in 3D space, disregarding the order of the residues on the chains. Such computer-vision and robotics-based algorithms enable comparisons of protein surfaces, interfaces, or protein cores without being limited by the sequential order. Since the last site visit, we have made substantial progress in the development of new algorithms. Some of these (docking, and binding site comparison and detection) have already been described above. To enumerate the methods we have developed since the last site visit: residue-based multiple protein structure comparison (MultiProt; multiple alignment of proteins in their secondary structure representation (MASS); multiple alignment of protein structures in the functional group representation and of their binding sites (MultiBind), and of protein-protein interfaces (MAPPIS); SiteEngine, which carries out small molecule and protein-binding site recognition and I2ISiteEngine, which carries out pairwise structural comparisons of interfaces; flexible alignment of protein structures (FlexProt; Rigid body docking (PatchDock); Flexible hinge-bending docking (FlexDock); Symmetry docking (SymmDock; Combinatorial docking for folding and multimolecular assembly (CombDock); Prediction of binding sites using phage display libraries (SiteLight); and MolAxis to detect channels and cavities in proteins in a highly efficient matter even if the diameter of these is very small. In addition, using these, two nonredundant datasets of protein-protein interfaces have been assembled. The methods are all highly efficient with state of-the-art capabilities. I have already discussed the docking methods, SiteEngine and MAPPIS (Multiple Alignment of Protein-Protein InterfaceS). Below I briefly describe FlexProt, MASS, MultiProt and MolAxis. Most methods for multiple alignment start from the pairwise alignment solutions. In contrast, MASS and MultiProt derive multiple alignments from simultaneous superpositions of input molecules. Further, both methods do not require that all input molecules participate in the alignment. Actually, they efficiently detect high scoring partial multiple alignments for all possible number of molecules in the input. MASS (Multiple Alignment by Secondary Structures) and MultiProt (Multiple Proteins) are fully automated highly efficient techniques to detect multiple structural alignments of protein structures and detect common geometrical cores between input molecules. Furthermore, both methods are sequence-order independent. MASS is based on a two-level alignment, using both secondary structure and atomic representation. Utilizing secondary structure information aids in filtering out noisy solutions and achieves efficiency and robustness. MASS is capable of detecting nontopological structural motifs, where the secondary structures are arranged in a different order on the chains. Further, MASS is able to detect not only structural motifs, shared by all input molecules, but also motifs shared only by subsets of the molecules. We have demonstrated its ability to handle on the order of tens of molecules, to detect nontopological motifs and to find biologically meaningful alignments within nonpredefined subsets of the input. MASS is available at http://bioinfo3d.cs.tau.ac.il/MASS/. MultiProt considers protein structures as represented by points in space, where the points are either the C-alpha coordinates or the C-alpha and atoms or geometric center of the side chain. MultiProt is available at http://bioinfo3d.cs.tau.ac.il/MultiProt/. We have illustrated the power of both methods on a range of applications. The order-independence allows application of MultiProt to binding sites and protein-protein interfaces, making MultiProt an extremely useful structural tool. FlexProt is a novel technique for the alignment of flexible proteins. Unlike all previous algorithms to solve structural comparisons allowing hinge-bending motions, FlexProt does not require an a priori knowledge of the location of the hinge(s). FlexProt carries out the flexible alignment superimposing the matching rigid subpart pairs, and detects the flexible hinge regions simultaneously. Protein structural analysis requires algorithms that can deal with molecular flexibility. FlexProt efficiently detects maximal congruent rigid fragments in both molecules. Transforming the task into a graph theoretic problem, it calculates the optimal arrangement of previously detected maximal congruent rigid fragments. FlexProt performs a structural comparison of a pair of proteins 300 amino acids long in about seven seconds on a standard desktop PC. FlexProt can be accessed via the web at bioinfo3d.cs.tau.ac.il/FlexProt/. MolAxis is a freely available, easy-to-use web server for identification of channels that connect buried cavities to the outside of macromolecules and for transmembrane (TM) channels in proteins. Biological channels are essential for physiological processes such as electrolyte and metabolite transport across membranes and enzyme catalysis, and can play a role in substrate specificity. Motivated by the importance of channel identification in macromolecules, we developed the MolAxis server. MolAxis implements state-of-the-art, accurate computational-geometry techniques that reduce the dimensions of the channel finding problem, rendering the algorithm extremely efficient. Given a protein or nucleic acid structure in the PDB format, the server outputs all possible channels that connect buried cavities to the outside of the protein or points to the main channel in TM proteins. For each channel, the gating residues and the narrowest radius termed 'bottleneck' are also given along with a full list of the lining residues and the channel surface in a 3D graphical representation. The users can manipulate advanced parameters and direct the channel search according to their needs. MolAxis is available as a web server or as a stand-alone program at http://bioinfo3d.cs.tau.ac.il/MolAxis. In addition, we have been developing methods to identify unpredefined tertiary structure of RNA using structural comparison techniques. We are applying it to the entire database of currently available RNA strucures (NMR and crystal) to derive a clustered nonredundant dataset or RNA tertiary structures; and to identify RNA binding sites on protein surfaces for extruded RNA bases in single stranded RNA.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Intramural Research (Z01)
Project #: 1Z01BC010442-07
Application #: 7733033
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 7
Fiscal Year: 2008
Total Cost: $168,288
Indirect Cost

Institution

Name: National Cancer Institute Division of Basic Sciences
Department
Type
DUNS #

City
State
Country: United States
Zip Code

Related projects


NIH 2008 Z01 CA	Method Development: Efficient Computer Vision Based Algorithms Nussinov, Ruth / National Cancer Institute Division of Basic Sciences	$168,288
NIH 2007 Z01 CA	Method Development: Efficient Computer Vision Based Algorithms Nussinov, Ruth / National Cancer Institute Division of Basic Sciences	$285,362

Publications

Zanuy, David; Ballano, Gema; Jimenez, Ana I et al. (2009) Protein segments with conformationally restricted amino acids can control supramolecular organization at the nanoscale. J Chem Inf Model 49:1623-9

Schneidman-Duhovny, Dina; Dror, Oranit; Inbar, Yuval et al. (2008) Deterministic pharmacophore detection via multiple flexible alignment of drug-like molecules. J Comput Biol 15:737-54

Yaffe, Eitan; Fishelovitch, Dan; Wolfson, Haim J et al. (2008) MolAxis: efficient and accurate identification of channels in macromolecules. Proteins 73:72-86

Shatsky, Maxim; Shulman-Peleg, Alexandra; Nussinov, Ruth et al. (2006) The multiple common point set problem and its application to molecule binding pattern detection. J Comput Biol 13:407-28

Shatsky, Maxim; Nussinov, Ruth; Wolfson, Haim J (2006) Optimization of multiple-sequence alignment based on multiple-structure alignment. Proteins 62:209-17

Wainreb, Gilad; Haspel, Nurit; Wolfson, Haim J et al. (2006) A permissive secondary structure-guided superposition tool for clustering of protein fragments toward protein structure prediction via fragment assembly. Bioinformatics 22:1343-52

Dror, Oranit; Nussinov, Ruth; Wolfson, Haim J (2006) The ARTS web server for aligning RNA tertiary structures. Nucleic Acids Res 34:W412-5

Inbar, Yuval; Benyamini, Hadar; Nussinov, Ruth et al. (2005) Prediction of multimolecular assemblies by multiple docking. J Mol Biol 349:435-47

Schneidman-Duhovny, Dina; Inbar, Yuval; Nussinov, Ruth et al. (2005) Geometry-based flexible and symmetric protein docking. Proteins 60:224-31

Schneidman-Duhovny, Dina; Inbar, Yuval; Nussinov, Ruth et al. (2005) PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res 33:W363-7

Showing the most recent 10 out of 29 publications

Comments

Be the first to comment on Ruth Nussinov's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: