Molecular interactions play a central role in all biological processes. Akin to the complete sequencing of genomes, complete descriptions of interactomes is a fundamental step towards a deeper understanding of biological processes, and has a vast potential to impact systems biology, genomics, molecular biology and therapeutics. Protein-protein interactions (PPIs) and protein-RNA interactions (PRIs) are of particular interest as they are critical in maintenance of cellular integrity, metabolism, transcription/translation, and cell-cell communication. Although high-throughput experimental PPI and PRI data is rapidly accumulating, building complete and confident datasets requires multiple replicates of expensive screens. This proposal aims to develop new methods that will significantly advance our efforts at structure-based approaches to better predict PPIs and RPIs and boost confidence in emerging high-throughput (HTP) data with the goal of comprehensive interactome mapping at lower cost. Taken together, these methods will vastly expand our understanding of macromolecular networks. We will continue to devise structure-based methods for protein-protein interaction prediction and branch out to methods for protein-RNA interaction prediction;this represents a major shift from the purely sequence-based approaches that most bioinformatics approaches utilize to predict We will also build computational frameworks for boosting confidence in HTP protein-protein and protein-RNA interaction datasets using structure-based approaches;these frameworks will provide a comprehensive assessment of in-house and public HTP data, with potential biomedical applications such as heat shock protein-kinase interactions related to development for cancer therapeutics, MAPK6's role in a cancer-related signaling network, and (long non-coding) RNA-protein binding roles in neurodegenerative disease. Finally, we will computationally screen for PPIs and PRIs at the genome scale and expand our Struct2Net webserver to disseminate tools based on our methods and results to the community. An increasing number of HTP interaction datasets are being determined, thus presenting new opportunities to leverage this data in conjunction with structural insights to map binding sites and to uncover the underlying molecular mechanisms of cellular functions. molecular interactions and will enhance coverage and accuracy of the complete interactome. Successful completion of these aims will result in computational methods that will significantly increase our confidence in high-throughput data on protein-protein and protein-RNA interactions and will reveal fundamental aspects of their functioning, as well as testable hypotheses for experimental investigations. All developed software will be made publicly available.

Public Health Relevance

Biological processes are carried out through thousands of interactions between various types of molecules (the Interactome) that play fundamental roles in all biomedical processes including the maintenance of cellular integrity, metabolism, transcription/translation, and cell-cell communication. Understanding these interaction networks on a large scale will empower both rational, targeted drug design and more intelligent disease management. In this project, we develop computational methods for structure-based prediction of protein-protein and protein- RNA interactions, and integrate these predictions with available high-throughput genomic data to predict the Interactomes of entire species'genomes.

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Wu, Mary Ann
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Massachusetts Institute of Technology
Organized Research Units
United States
Zip Code
Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie (2016) RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data. Bioinformatics 32:i351-i359
Nazeen, Sumaiya; Palmer, Nathan P; Berger, Bonnie et al. (2016) Integrative analysis of genetic data sets reveals a shared innate immune component in autism spectrum disorder and its co-morbidities. Genome Biol 17:228
Toth-Petroczy, Agnes; Palmedo, Perry; Ingraham, John et al. (2016) Structured States of Disordered Proteins from Genomic Sequences. Cell 167:158-170.e12
Chirn, Gung-Wei; Rahman, Reazur; Sytnikova, Yuliya A et al. (2015) Conserved piRNA Expression from a Distinct Set of piRNA Cluster Loci in Eutherian Mammals. PLoS Genet 11:e1005652
Orenstein, Yaron; Berger, Bonnie (2015) Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers. J Comput Biol :
Simmons, Sean; Peng, Jian; Bienkowska, Jadwiga et al. (2015) Discovering What Dimensionality Reduction Really Tells Us About RNA-Seq Data. J Comput Biol 22:715-28
Taipale, Mikko; Tucker, George; Peng, Jian et al. (2014) A quantitative chaperone interaction network reveals the architecture of cellular protein homeostasis pathways. Cell 158:434-48
Waldispühl, Jérôme; O'Donnell, Charles W; Will, Sebastian et al. (2014) Simultaneous alignment and folding of protein sequences. J Comput Biol 21:477-91
Tucker, George; Loh, Po-Ru; Berger, Bonnie (2013) A sampling framework for incorporating quantitative mass spectrometry data in protein interaction analysis. BMC Bioinformatics 14:299
Berger, Bonnie; Peng, Jian; Singh, Mona (2013) Computational solutions for omics data. Nat Rev Genet 14:333-46

Showing the most recent 10 out of 29 publications