Computational infrastructure for efficient and accurate searching of bio-molecules from various databases is foundation of any modern biology, biochemistry, pharmacology, and biotechnology. The goal of this project is to develop computational methods and databases that allow fast, real-time screening of various types of three dimensional (3D) structural data of proteins and their interacting molecules in a seamless fashion. The structure data to be searched include 3D protein structures and protein complexes, predicted protein structures, low-resolution protein complexes solved by cryo-electron microscopy, small chemical ligand molecules, and drug molecules. The project employs a mathematical representation of biomolecules that can quickly compare and search biomolecules that have similar global and local surface shape and properties with a query molecule. The project will further expand the applicability of the molecule representation for searching interacting molecules by identifying complementarity of shapes and surface properties. The methods to be developed in the project allow biologists to quickly identify potentially interacting proteins to a query protein, which will help generating testable hypothesis of molecular mechanisms of diseases through building molecular networks. Moreover, the methods will also enable quick searching of ligand molecules and potential drug molecules that fit to a target protein.
Biology has entered the informatics era, when combining different types of big omics data are routinely required to reach a systems-level understanding of biological function of molecules and cells. In order to effectively glean useful structural data for biological studies, there is a strong need for computational methods that can quickly and seamlessly search for different types of structural data. Establishing efficient methods for searching biomolecular shape and physicochemical properties is essential for capitalizing on the large number of efforts directed towards determining molecular and cellular structures by structural genomics and other projects. The project will develop computational methods and databases to screen various types of protein structures and their interacting molecules seamlessly and quickly. Using the molecular representation proposed in the project, global and local shapes and surface properties (electrostatic potential, hydrophobicity) of proteins and ligand molecules can be compared ery fast. In contrast to conventional 3D structure search methods for biomolecules that take hours or even more than a day to finish a database search, the methods to be developed will allow real-time searches against large databases. Thus, structural analysis will become as convenient as sequence database searches for biology researchers. The 3D molecule search methods will be applied to identify interacting molecules for a query protein, ligand molecules that would bind to a pocket region of the query protein as well as interacting proteins. Knowing molecular interactions is critical for understanding functions of proteins. The key innovations include 1) finding interacting molecules to proteins, i.e. pocket-ligand interactions and protein-protein interactions; 2) local surface comparisons for functional annotations; Developed methods will be implemented into 3D-Surfer, a one-stop website for biomolecular shape retrieval.
The proposed approach can be applied for other types of rapid shape and property comparisons, such as 2D and 3D medical images, microscope images, geographical landscapes, and face recognition. Graduate and undergraduate students in biological sciences and computer science will be trained in cross-listed courses among several departments. Several existing programs at Purdue for recruiting minority students and undergraduate students will contribute to broad participation in the project. Overall the proposed project leverages Purdue University?s efforts in interdisciplinary computational life science and engineering.