Almost all proteins function through interacting with other proteins. On average, a protein interacts with ~5 other protein partners in the current human interactome. Therefore, it is of great importance to accurately determine the interface of each interaction, in order to understand how each protein works with different partners to carry out different functions. In our previous Nature Biotechnology study, we implemented a proteome-scale homology modeling approach to generate the first 3D human structural interactome: the interface for each interaction in this network was determined at atomic resolution through co-crystal structures and homology models. Using our 3D interactome, we found that, among >1,800 known disease genes associated with two or more clinically distinctly disorders, pairs of mutations on the same gene but in different interfaces with different partners are significantly more likely to cause distinct diseases. However, only 4,150 human protein interactions have co-crystal structures and 2,921 have high-quality homology models. ~50,000 interactions (87% of the current human interactome) are not amenable to current structural modeling methods. Here, we propose to develop a big-data-driven machine-learning approach integrating biophysiochemical, evolutionary, structural, and population genetic features to identify interaction- specific interfaces for the whole human interactome. Because several key features are unavailable for many proteins and interactions, we propose an innovative approach to use an ensemble of random forest classifiers, named Ensemble Protein Interface Classifier (EPIC), to address this large-scale non-random missing data problem (Aim 1). The high throughput of our massively parallel Clone-seq and INtegrated PrOtein INteractome perTurbation screening (InPOINT) pipeline! uniquely enables us to perform real-time experimental parameter optimization (in Years 2-4 we will clone ~1,500 mutations and examine their impact on ~2,500 interactions every year to iteratively evaluate and refine EPIC;
Aim 2). Finally, we will construct a comprehensive multiscale 3D interactome for all known human protein-protein interactions: we will collect/generate atomic- resolution structural models for interactions whenever possible (co-crystal structures and homology models); we will accurately determine interaction-specific interface residues and domains for the whole human interactome. We will deploy an interactive web portal to disseminate our results and allow functional genomic inference in the context of our structural interactome (Aim 3). Our comprehensive multiscale 3D human interactome and the accompanying web portal will greatly reduce the barrier-to-entry for performing systematic structural analysis on a large number of proteins and their interactions, and open the flood gates for such analyses in genomic studies.

Public Health Relevance

Almost all proteins function through interacting with other proteins and the structural details of these interaction interfaces are key in understanding protein function. However, the interfaces for vast majority of human protein interactions are currently unknown. Here, we propose to establish an innovative ensemble classifier approach and implement an unprecedented large-scale computational-experimental iterative learning scheme to predict interfaces for the whole human interactome, in anticipation that our predicted interfaces will help dissect functional sites of disease mutations and be useful for rational drug design to target these sites.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM124559-04
Application #
9963288
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2017-07-01
Project End
2021-06-30
Budget Start
2020-07-01
Budget End
2021-06-30
Support Year
4
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Cornell University
Department
Miscellaneous
Type
Organized Research Units
DUNS #
872612445
City
Ithaca
State
NY
Country
United States
Zip Code
14850
Chen, Siwei; Fragoza, Robert; Klei, Lambertus et al. (2018) An interactome perturbation framework prioritizes damaging missense mutations for developmental disorders. Nat Genet 50:1032-1040
Meyer, Michael J; Beltrán, Juan Felipe; Liang, Siqi et al. (2018) Interactome INSIDER: a structural interactome browser for genomic studies. Nat Methods 15:107-114
Malty, Ramy H; Aoki, Hiroyuki; Kumar, Ashwani et al. (2017) A Map of Human Mitochondrial Protein Interactions Linked to Neurodegeneration Reveals New Mechanisms of Redox Homeostasis and NF-?B Signaling. Cell Syst 5:564-577.e12