Complex biological systems and cellular networks underlie most genotype to phenotype relationships. In the last decade, basic concepts of network biology have been described, emphasizing why cellular networks are important to consider in biology. Importantly, it is becoming increasingly clear that more high quality empirically derived datasets are needed to better describe biological networks and genotype to phenotype relationships. The interactome of an organism is the network formed by the complete set of interactions that can occur in a physiologically relevant dynamic range between all its macromolecules, including protein-protein, DNA-protein, RNA-protein, and RNA-RNA interactions. In this proposal, we focus on high-throughput (HT), proteome-scale mapping of what we refer to as the REFERENCE human binary protein-protein interactome network map. Major innovations in this application enable to define a clear roadmap for completion of this REFERENCE map by the end of this decade. During this coming cycle, we will expand the human HT binary interactome map from ~15% coverage, which is the milestone of the current cycle, to ~50%. We will also briefly discuss how we foresee further expansion to near completeness thereafter. The accumulation of DNA sequencing data exploded for the Human Genome Sequencing project in the 1990s when four crucial elements were assembled: i) cosmids, BAC, and YAC clone resources covering most of the genome;ii) automated laser-fluorescence sequencing, iii) the PHRED score used to systematically assess sequencing data quality, and iv) the development of """"""""hands-off"""""""" automated experimental steps. We describe below how the human binary interactome mapping project is reaching a similarly exploding phase: i) having significantly contributed to the ORFeome Collaboration (OC) we now have a nearly complete protein-coding ORF clone resource, ii) we developed a new strategy to apply the power of next-generation sequencing to interactome mapping, iii) we have published a new empirical framework that systematically assess interactome mapping data quality, and iv) we will describe new """"""""hands-off"""""""" automated strategies that greatly increase throughput and decrease cost.
Our specific aims are: i) to expand human binary interactome mapping to a full complement of protein-coding genes cloned by OC, ii) to reach ~50% coverage of the REFERENCE human binary interactome network map, and iii) to expand global network analyses of our newly mapped human binary interactome network.

Public Health Relevance

The availability of (nearly) complete genome sequences for several model organisms and for human is changing the way scientists formulate and address biological questions. With large numbers of protein predictions, the traditional one-at-a-time approach can now be complemented by more global strategies that consider all proteins at once. Such approaches, referred to as systems biology have the ultimate goal of providing quantitative and dynamic models to describe biological processes. One major impediment to this prospect however is that most predicted proteins have not yet been experimentally characterized in detail. Interactome maps can be used to formulate functional hypotheses for thousands of uncharacterized genes. In addition, global features of the resulting interactome networks have been proposed that provide worthwhile biological insights. From these insights and hypotheses, a better understanding of disease processes and better strategies for therapeutic intervention are anticipated.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Gatlin, Christine L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Dana-Farber Cancer Institute
United States
Zip Code
Choi, Dongsic; Montermini, Laura; Kim, Dae-Kyum et al. (2018) The Impact of Oncogenic EGFRvIII on the Proteome of Extracellular Vesicles Released from Glioblastoma Cells. Mol Cell Proteomics 17:1948-1964
Díaz-Mejía, J Javier; Celaj, Albi; Mellor, Joseph C et al. (2018) Mapping DNA damage-dependent genetic interactions in yeast via party mating and barcode fusion genetics. Mol Syst Biol 14:e7985
Cheng, Feixiong; Desai, Rishi J; Handy, Diane E et al. (2018) Network-based approach to prediction and population-based validation of in silico drug repurposing. Nat Commun 9:2691
Cenik, Can; Chua, Hon Nian; Singh, Guramrit et al. (2017) A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and N1-methyladenosine modification. RNA 23:270-283
Jo, Myungjin; Chung, Ah Young; Yachie, Nozomu et al. (2017) Yeast genetic interaction screen of human genes associated with amyotrophic lateral sclerosis: identification of MAP2K5 kinase as a potential drug target. Genome Res 27:1487-1500
Starita, Lea M; Ahituv, Nadav; Dunham, Maitreya J et al. (2017) Variant Interpretation: Functional Assays to the Rescue. Am J Hum Genet 101:315-325
Chung, Chee Yeun; Khurana, Vikram; Yi, Song et al. (2017) In Situ Peroxidase Labeling and Mass-Spectrometry Connects Alpha-Synuclein Directly to Endocytic Trafficking and mRNA Metabolism in Neurons. Cell Syst 4:242-250.e4
Khurana, Vikram; Peng, Jian; Chung, Chee Yeun et al. (2017) Genome-Scale Networks Link Neurodegenerative Disease Genes to ?-Synuclein through Specific Molecular Pathways. Cell Syst 4:157-170.e14
Yachie, Nozomu; Petsalaki, Evangelia; Mellor, Joseph C et al. (2016) Pooled-matrix protein interaction screens using Barcode Fusion Genetics. Mol Syst Biol 12:863
Sun, Song; Yang, Fan; Tan, Guihong et al. (2016) An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res 26:670-80

Showing the most recent 10 out of 38 publications