One of the principal needs for structural genomics is a methodology for automated protein structure comparison and classification. Earlier, we have developed a tool, Simplicial Neighborhood Analysis of Protein Packing (SNAPP) for the identification of recurrent sequence-structure motifs in a collection of protein structures. We propose systematic application of statistical geometry and geometric pattern matching techniques for the identification of protein family specific packing patterns (family signatures). We further propose to use these signatures for comparison and classification of known 3D protein structures. Finally, we aim to demonstrate that some of these structural patterns can be mapped onto underlying protein sequences forming sequence specific pattern and therefore used also for sequence annotation and classification. We employ a computational geometry technique known as Delaunay tessellation, which partitions protein structures into unique sets of quadruplet contacts. This consideration reduces tertiary structure to a natural basis set of motifs that may be characteristic of protein structural and functional classes. A broader definition of motifs can be obtained by applying frequent common subgraph mining approaches to the collections of protein graphs representing known structural and functional families. To discover structural and functional family specific motifs and apply them towards protein classification and annotation, this proposal is structured around the following Specific Aims:
Aim 1. Develop novel algorithms to identify protein family specific packing motifs based on frequent common subgraph mining of protein graph families;
Aim 2 : Identify specific amino acid packing motifs in diverse protein families and define them as sequence specific signatures;
Aim 3 : Develop methodologies for protein annotation based on family-specific packing motifs. This project benefits from collaborative efforts of four investigators with complimentary expertise in structural bioinformatics (Tropsha), computational geometry (Snoeyink), data mining (Wang), and high-performance computing (Prins). The proposed methodologies are expected to be both robust and efficient to afford their application to large, post-genomic scale databases of protein structures and sequences. The proposed studies shall lead to the discovery of previously unknown patterns of amino acid residues that are important for protein structure and function. Functional annotation of orphan proteins will expand our knowledge of the human proteome. Since proteins are the most typical therapeutic targets, our research aimed at bettering our understanding of the protein structure-function relationships should facilitate the discovery of novel targets for drug therapy thereby contributing to the improvement of human health.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM068665-04
Application #
7665373
Study Section
Special Emphasis Panel (ZRG1-BCMB-Q (02))
Program Officer
Wehrle, Janna P
Project Start
2006-08-01
Project End
2011-07-31
Budget Start
2009-08-01
Budget End
2011-07-31
Support Year
4
Fiscal Year
2009
Total Cost
$264,449
Indirect Cost
Name
University of North Carolina Chapel Hill
Department
Pharmacology
Type
Schools of Pharmacy
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599
Khashan, Raed; Zheng, Weifan; Tropsha, Alexander (2012) Scoring protein interaction decoys using exposed residues (SPIDER): a novel multibody interaction scoring function based on frequent geometric patterns of interfacial residues. Proteins 80:2207-17
Bandyopadhyay, Deepak; Huan, Jun; Liu, Jinze et al. (2010) Functional neighbors: inferring relationships between nonhomologous protein families using family-specific packing motifs. IEEE Trans Inf Technol Biomed 14:1137-43
Lei, Seak Fei; Huan, Jun (2010) Towards site-based protein functional annotations. Int J Data Min Bioinform 4:452-70
Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan et al. (2009) Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development. J Comput Aided Mol Des 23:773-84
Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan et al. (2009) Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: II. Case studies and applications. J Comput Aided Mol Des 23:785-97
Zhang, Shuxing; Kaplan, Andrew H; Tropsha, Alexander (2008) HIV-1 protease function and structure studies with the simplicial neighborhood analysis of protein packing method. Proteins 73:742-53
Drummond, D Allan; Silberg, Jonathan J; Meyer, Michelle M et al. (2005) On the conservative nature of intragenic recombination. Proc Natl Acad Sci U S A 102:5380-5