Protein-protein interactions (PPIs) play a central role in all biological processes. Akin to the complete sequencing of genomes, complete descriptions of interactomes is a fundamental step towards a deeper understanding of biological processes, and has a vast potential to impact systems biology, genomics, molecular biology and therapeutics. Although high-throughput biochemical approaches for discovering PPIs have proven very successful, the current experimental coverage of the interactome remains inadequate and would benefit from computational tools. The broad, long term goal of this proposal is to harness the information provided by structure-based computational approaches as a potentially high-quality, high-coverage data source for large-scale integrative approaches to interactome construction. Specifically, this project aims to: 1) develop new structure-based prediction methods that can be applied on a genome scale, and 2) integrate these predictions with other functional genomic information to predict PPIs at a genome scale. This project will also generate testable hypotheses for experimental investigations. A key product of the proposed research is the LTHREADER program, a localized threading program that will simultaneously align query sequence-pairs to templates of protein-protein interfaces. By exploiting information contained in the protein complex interfaces, it may significantly improve upon the state-of-the-art in coverage and prediction quality. Some of the core computational aspects are the development of algorithms for threading query sequence-pairs to templates (using linear programming), learning statistical potentials (SVMs), and combining multiple protein interface scores for PPI prediction (boosting). The output from such structure-based approaches will be combined with other functional genomic data in the Struct2Net framework for predicting PPIs (using random forests). A final product of the proposed research will be a comprehensive database of genome-wide PPI predictions derived from purely structure-based as well as integrative approaches. The database will also include extracellular ligand-receptor interactions. The prediction of PPIs will enable better elucidation of extracellular and intracellular signaling networks, which has direct medical implications in terms of drug target identification. For example, a promising public-health application of this research is the rational design of therapeutics which inhibit or interfere with the binding of extracellular ligands to receptors. All the produced computational algorithms, software, and databases will be made publicly available for further studies.Relevance Proteins interact with each other to communicate within and between cells, forming networks (the Interactome) that play fundamental roles in all biomedical processes including the maintenance of cellular integrity, metabolism, transcription/translation, and cell-cell communication. Understanding these interaction networks on a large scale will empower both rational, targeted drug design and more intelligent disease management. In this project, we develop computational methods for structure-based prediction of protein-protein interactions, and integrate these predictions with available high- throughput genomic data to predict the Interactomes of entire species'genomes.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Remington, Karin A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Massachusetts Institute of Technology
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Ma, Cheng-Yu; Chen, Yi-Ping Phoebe; Berger, Bonnie et al. (2017) Identification of protein complexes by integrating multiple alignment of protein interaction networks. Bioinformatics 33:1681-1688
Orenstein, Yaron; Puccinelli, Robert; Kim, Ryan et al. (2017) Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping. Cell Syst 5:230-236.e5
Liu, Yang; Palmedo, Perry; Ye, Qing et al. (2017) Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks. Cell Syst :
Khurana, Vikram; Peng, Jian; Chung, Chee Yeun et al. (2017) Genome-Scale Networks Link Neurodegenerative Disease Genes to ?-Synuclein through Specific Molecular Pathways. Cell Syst 4:157-170.e14
Toth-Petroczy, Agnes; Palmedo, Perry; Ingraham, John et al. (2016) Structured States of Disordered Proteins from Genomic Sequences. Cell 167:158-170.e12
Cho, Hyunghoon; Berger, Bonnie; Peng, Jian (2016) Compact Integration of Multi-Network Topology for Functional Analysis of Genes. Cell Syst 3:540-548.e5
Nazeen, Sumaiya; Palmer, Nathan P; Berger, Bonnie et al. (2016) Integrative analysis of genetic data sets reveals a shared innate immune component in autism spectrum disorder and its co-morbidities. Genome Biol 17:228
Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie (2016) RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data. Bioinformatics 32:i351-i359
Sahni, Nidhi; Yi, Song; Taipale, Mikko et al. (2015) Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161:647-660
Cho, Hyunghoon; Berger, Bonnie; Peng, Jian (2015) Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks. Res Comput Mol Biol 9029:62-64

Showing the most recent 10 out of 41 publications