The research described in the current proposal is intended to provide computational tools and data resources that will both enhance the understanding of disease-related Single Nucleotide Polymorphisms (SNPs) and the protein-protein interaction (PPI) pathways they impact and will, in addition, provide new mechanistic insights regarding cancer-related signaling pathways. More generally, the research described offers fundamentally new approaches to the molecular-level understanding of human disease through two Specific Aims: 1) The structure-enabled annotation of disease-related SNPs; and 2) The molecular-level annotation of cancer protein-protein interactomes. Integral to both aims is the development of new computational tools of broad applicability. The proposed research strategy is based in large part on the PrePPI (Predicting Protein-Protein Interactions) pipeline which integrates structural and non-structural information using Bayesian statistics to predict the likelihood that two proteins interact ? either physically or indirectly. The PrePPI database of about 1.35 million predicted human PPIs has been shown to provide comparable accuracy to high-throughput experimental databases but is far larger in scale and scope. PrePPI relies heavily on three-dimensional structural information and is quite unique in this regard.
Aim 1 focuses on the creation of a database in which all human SNPs are mapped to the protein structures and the models contained in PrePPI. PrePPI predicted PPIs contain information about interfacial residues and this allows the development of a predictive algorithm to determine whether a SNP disrupts a PPI. Different structural features regarding SNPs will provide the variables for this algorithm, and their contribution will be determined using a Bayesian approach which exploits a positive reference set containing disease- related SNPs and a negative set containing benign SNPs.
Aim 2 focuses on the functional, structural, and molecular characterization of cancer pathways and the creation of interactomes for known oncogenes such as K-Ras. PrePPI will be combined with network-based algorithms to predict interaction partners of these oncogenes and the results will be tested with biophysical and cellular assays. In addition, protein family-specific versions of PrePPI will be developed so as to facilitate a more refined prediction of interaction partners. Finally, comprehensive interactomes will be constructed for the ~550 cancer-related proteins in the Cancer Gene Census maintained by the Catalog of Somatic Mutations in Cancer (COSMIC), and this information will be incorporated into the expanded PrePPI database. The integration of the structure-enabled annotation of disease-related SNPs with cancer interactomes is very much in keeping with the NIH Precision Medicine Initiative: Assigning functions to all SNPs, rather than just the most frequently occurring ones, is crucial to tailoring therapeutic treatments on an individual basis.

Public Health Relevance

Genome Wide Association Studies (GWAS) have revealed many Single Nucleotide Polymorphisms (SNPs) whose roles in causing diseases are unclear. This proposal describes the development of a series of computational tools that will enable a molecular-level understanding of disease-related SNP and their functional consequences with focus on cancer mutations. The tools and resources to be created will support the recently announced initiative in Precision Medicine by characterizing SNPs in the context of protein structures and cellular pathways thus facilitating the design of individualized therapies.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Mcguirl, Michele
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Internal Medicine/Medicine
Schools of Medicine
New York
United States
Zip Code
Zeiske, Tim; Baburajendran, Nithya; Kaczynska, Anna et al. (2018) Intrinsic DNA Shape Accounts for Affinity Differences between Hox-Cofactor Binding Sites. Cell Rep 24:2221-2230
Hirabayashi, Yusuke; Kwon, Seok-Kyu; Paek, Hunki et al. (2017) ER-mitochondria tethering by PDZD8 regulates Ca2+ dynamics in mammalian neurons. Science 358:623-630
Lopez-Rivera, Esther; Liu, Yangfan P; Verbitsky, Miguel et al. (2017) Genetic Drivers of Kidney Defects in the DiGeorge Syndrome. N Engl J Med 376:742-754
Hwang, Howook; Dey, Fabian; Petrey, Donald et al. (2017) Structure-based prediction of ligand-protein interactions on a genome-wide scale. Proc Natl Acad Sci U S A 114:13685-13690
Sheng, Ren; Jung, Da-Jung; Silkov, Antonina et al. (2016) Lipids Regulate Lck Protein Activity through Their Interactions with the Lck Src Homology 2 Domain. J Biol Chem 291:17639-50
Hwang, Howook; Petrey, Donald; Honig, Barry (2016) A hybrid method for protein-protein interface prediction. Protein Sci 25:159-65
Ma, Lijiang; Bayram, Yavuz; McLaughlin, Heather M et al. (2016) De novo missense variants in PPP1CB are associated with intellectual disability and congenital heart disease. Hum Genet 135:1399-1409
Harrison, Oliver J; Brasch, Julia; Lasso, Gorka et al. (2016) Structural basis of adhesive binding by desmocollins and desmogleins. Proc Natl Acad Sci U S A 113:7160-5
Park, Mi-Jeong; Sheng, Ren; Silkov, Antonina et al. (2016) SH2 Domains Serve as Lipid-Binding Modules for pTyr-Signaling Proteins. Mol Cell 62:7-20
Westphalen, C Benedikt; Takemoto, Yoshihiro; Tanaka, Takayuki et al. (2016) Dclk1 Defines Quiescent Pancreatic Progenitors that Promote Injury-Induced Regeneration and Tumorigenesis. Cell Stem Cell 18:441-55

Showing the most recent 10 out of 88 publications