PI: Marc Vidal (Dana-Farber Cancer Center) CoPIs: Joseph Ecker (The Salk Institute for Biological Studies) and David Hill (Dana-Farber Cancer Institute)

For over half a century it has been conjectured that macromolecules form complex networks of functionally interacting components, and that the molecular mechanisms underlying most biological processes correspond to particular steady states adopted by such cellular networks. However, until recently, such systems-level theoretical conjectures remained largely unappreciated, mainly because of lack of supporting experimental data.

This project will develop a large-scale, high coverage protein-protein interaction or ''interactome'' map for plants that will provide a starting point for systems-level studies in plants. The recent completion of whole genome sequences for several plant species has revealed an enormous amount of conservation among their encoded proteins. This high level of conservation suggests that the development of proteome-wide maps using a few well-studied reference plant species is a cost effective approach to advance plant research as a whole. Importantly, this project is expected to have broad implications for research on economically important plant species that are more difficult to study.

Building on earlier work from the two participating laboratories, this project utilizes an existing set of cDNA-derived plant open-reading-frame (ORF) clones to construct a high-coverage ''plant interactome mapping resource''. A high quality plant protein interactome network map will be generated using two complementary approaches: an improved yeast 2-hybrid assay and a protein microarray-based approach. A set of ~12,000 Arabidopsis genes for which full-length protein-encoding ORFs have already been cloned and sequence validated will provide the scaffold of this project. In addition a set of 2,500 rice ORF clones targeted to two biological areas (plant innate immunity and kinase signal transduction pathways) will be used to expand the rice interactome network map

The impact of this project will be in two broad areas. First, the completion of the proposed research will result in an important new resource for the plant biology community- a large-scale plant protein interactome map. The availability of increased amounts of protein interaction information should positively impact a variety of plant research endeavors such as the analysis of cellular metabolic network and other systems biology studies. Importantly, all of the interaction data, ORF clones, and DNA sequences will be made freely available to the research community through the project websites and through established plant databases such as Gramene and TAIR. The long-term impact of these enabling tools and technologies on agriculture is expected to be profound, providing fundamental knowledge for the construction of new plant varieties with superior agronomic traits. The project will provide training and participation of plant scientists in protein network studies through the annual ?ORFeome Meeting? conference/workshop. With respect to outreach, the project will also provide training opportunities for minority high school and undergraduate students as well as for middle and high school teachers.

Project Report

More than ten years after the sequencing of the first plant genome, a significant portion of the proteome for any plant remains to be functionally and biochemically characterized. From our collaborative efforts, we have generated a high-coverage "plant interactome mapping resource" of over 6,200 binary protein-protein interactions that describes the first large-scale systematic plant interactome map with the vast majority of interactions being novel. Analysis of the network revealed intricate connectivity of signaling pathways, identified biologically consistent network communities and indicated that the interactome network itself may be a substrate of selective pressure. To facilitate the use of our plant interactome by the scientific community, our data is available in our recent Science publication as well as through our websites. Systematically derived proteome-scale datasets allow us to investigate the global structural organization of biological systems. We integrated a large set of diverse external data sources (including protein expression data, post-translational modification data, protein domain annotations, gene ontology information, paralogy relationships and interaction information from the literature) to describe the proteins in the Arabidopsis protein interaction network and establish functional associations. Sequences from other plant and non-plant species have been assembled in order to evaluate homology and orthology relationships. Based on this data, we are able to estimate the evolutionary age of each protein in the network, to identify interactions that could take place in crop plants and to identify pairs of interacting proteins that are significantly co-present/co-absent in other plant species. In many networks, communities of tightly interconnected components that function together can be identified. We applied a novel edge clustering approach to identify communities in our plant network and investigated their biological relevance. Edge clustering approaches, in contrast to node clustering methods, use edges, i.e. protein-protein interactions, as elements to identify communities, allowing proteins to be assigned to more than one community, an appealing concept as many proteins participate in different biological functions. Detailed inspection of these communities recapitulated available biological information with our results corroborated by previous findings in barley. The significant enrichment of shared GO annotations within communities, literature-based inspection of intra-community relationships, and examination of community boundaries, together support the relevance of the communities identified in our plant interactome. Topological analysis of our plant interactome recapitulates that seen in other biological networks – that is a "scale-free" characteristic in which a few proteins have many interactors but most have only a few. Such scale-free networks are resilient to random perturbations, but sensitive and easily destabilized by targeted attack on their most highly connected hubs. In a companion study that leveraged the new knowledge obtained from our plant interactome we demonstrated how effector proteins of the bacterial pathogen Pseudomonas syringae and the oomycete Hyaloperenospera arabidopsidis, evolutionarily separated by ~1 billion years, target the host network. Simulations demonstrate that an attack on experimentally identified effector targets is much more damaging to the network structure than an attack on the same number of randomly selected proteins. Consistent with this, we found that, although independently evolved and without any discernible homology to each other, the effector proteins from both pathogens converge on a small number of highly connected host proteins, knockout of which leads to either enhanced disease resistance or enhanced disease susceptibility phenotype in infection assays. To exploit our interactome information for understanding crop biology, it is critical to understand how evolutionary forces affect and remodel biological networks. The high fraction of duplicated genes in the Arabidopsis genome compared to non-plant species, combined with the relatively large size of our plant interactome, provides interactome data for 1,882 paralogous pairs. These pairs span a wide range of apparent interaction rewiring, as measured by the fraction of shared interactors for each pair. To study interaction rewiring dynamics, we dated gene duplication events using a comparative genomics approach and found that the average fraction of common interactors decreases over evolutionary time, showing substantial and rapid divergence. For Arabidopsis, paralogous pairs that have been diverging for ~700 million years still share more interactors than random proteins pairs, indicating that the long-term fate of paralogous proteins is not necessarily a complete divergence of their interaction profiles. The rate of rewiring appears "rapid-then-slow", as suggested by a power-law decay rather than by exponential decay expected from random rewiring. The fact that interactions diverge in a time-dependent manner similar to protein sequences supports the hypothesis that protein-protein interactions drive the evolution of duplicated genes. In summary, our high quality plant interactome network should not only hasten the functional characterization of unknown proteins, including those with potential biotechnological utility, but also enable systems level investigations of genotype-to-phenotype relationships in the plant kingdom. Our results also illustrate mechanisms and strategies by which plants cope with pathogenic challenges while studying sequence variation, conservation, mutation, and evolution rate has shed light on how natural selection drives evolution.

Agency
National Science Foundation (NSF)
Institute
Division of Integrative Organismal Systems (IOS)
Application #
0703905
Program Officer
Diane Jofuku Okamuro
Project Start
Project End
Budget Start
2007-09-15
Budget End
2012-08-31
Support Year
Fiscal Year
2007
Total Cost
$7,971,396
Indirect Cost
Name
Dana-Farber Cancer Institute
Department
Type
DUNS #
City
Boston
State
MA
Country
United States
Zip Code
02215