Almost all proteins function through interacting with other proteins. Previous studies have shown that the vast majority of damaging single amino acid mutations in proteins disrupt only a subset of specific protein-protein interactions, and that mutations in the same protein that disrupt different interactions tend to cause clinically distinct disorders. Therefore, it is of great importance to determine interaction-specific disruptions caused by each mutation. Furthermore, rapid advances in sequencing technologies have enabled the identification of tens of millions of single nucleotide variants (SNVs) in the human population, driving an urgent need to understand the impact of each SNV on the human interactome network. Unfortunately, there is currently no method that is capable of predicting the specific impact of a large fraction of these SNVs on individual protein-protein interactions. To address this issue, we propose to leverage our massively-parallel site-directed mutagenesis pipeline, Clone-seq, to generate clones for ~6,000 coding SNVs in the human population: ~4,000 from gnomAD and ~2,000 to be submitted by the international human genetics community. We will then experimentally examine the impact on protein stability and individual protein-protein interactions for every variant using high-throughput DUAL-FLUO and InPOINT (integrating PCA, LUMIER, Y2H, and wNAPPA) assays. This proposal brings together three groups with complementary expertise in high-throughput interactome experiments and network analysis from the Yu lab, in genomic and population genetic studies from the Clark lab, and in comprehensive biophysical and structural modeling of mutation?s impact on binding free energy of protein interactions from the Alexov lab. Out of the ~6,000 SNVs, we expect to identify ~1,200 disruptive SNVs and ~4,000 different SNV-interaction pairs where the SNV disrupt that specific interaction. The data produced by our project will increase the available experimental information by >140 in number of human proteins and >500 in number of interactions, allowing us for the first time to comprehensively assess the relationships between the impact of SNVs on interactions and their various population genetic attributes (including, but not limited to, allele frequency and flanking haplotype, inter-population differentiation, local rate of recombination, allele age, modes of selection). Finally, we will establish a computational-experimental- integrated iterative learning scheme to build a multi-layer random-forest-based framework, SIMPACT, which can accurately predict specific impacts on all individual protein-protein interactions for all missense SNVs. Our proposed work will fuel hypothesis-driven research, will significantly improve our functional understanding of variants, and will likely fundamentally change the experimental design and data interpretation for whole genome/exome studies going forward.
The dramatic increase of DNA variants discovered through advances in sequencing technologies has been inadequately translated into therapeutic successes. Although many of these variants are related to human disorders, the overwhelming number of non-functional variants makes the assessment of functional significance a steep challenge. In this study, we aim to develop a high-throughput pipeline to quickly clone and directly test a large number of coding variants for their impact on the human interactome network and use the results to build a machine learning pipeline to predict functional impact of all coding variants, in anticipation that both our experimental data and computational pipeline will lead to broad clinical and therapeutic applications.
|Chen, Siwei; Fragoza, Robert; Klei, Lambertus et al. (2018) An interactome perturbation framework prioritizes damaging missense mutations for developmental disorders. Nat Genet 50:1032-1040|