CoPI: Vineet Bafna and Laurie G. Smith (University of California - San Diego)

Collaborators: W. Joan Chen (San Diego State University), Laura J. Olsen (University of Michigan - Ann Arbor), Steven Rodermel and Patrick Schnable (Iowa State University), and Frank Hochholdinger (Universität Tübingen, Tübingen, Germany)

The most fundamental goal of genome science is to discover all of the protein-coding genes, and then to discern the abundance, location, and exact chemical composition of every protein made during the life cycle of an organism; this is called the proteome. A complete and accurately annotated proteome provides the foundation for studies of systems biology and molecular evolution, as well as for hypothesis-driven research. Recent progress in proteogenomics (using proteomic information to annotate the genome) has established it as a data-driven method that complements nucleotide (DNA and RNA)-based annotation strategies. Genome-wide, quantitative proteomics also makes possible the creation of a protein atlas that reveals the anatomical distribution of the proteome and protein sub-cellular locations. This project has two research aims. Aim 1 is to create an Atlas of Maize Proteins. The atlas includes the identity and relative amount of 40,000-50,000 proteins in each of 37 different tissues and stages of maize development. The atlas also includes the protein composition of the plasma membrane, chloroplast, mitochondrion, and peroxisome along with information about the protein changes caused by abiotic and biotic stress. Aim 2 provides proteogenomic discovery, revision, and confirmation of 40,000-50,000 maize gene models, including the identification of exons, the definition of translation start sites and exon borders, and the determination of the correct exon reading frames. This project enhances genome-enabled maize research and breeding by increasing the completeness and accuracy of maize genome annotation. Furthermore, investigations of maize physiology, development, cell functions, and breeding benefit from knowledge of the anatomical and sub-cellular distribution of maize proteins provided by the Atlas of Maize Proteins. Interdisciplinary educational and outreach opportunities are provided to post-docs, graduate students, undergraduates, high school students and Cal State researchers, with an emphasis on involvement of under-represented minorities.

This project will provide interdisciplinary educational and outreach opportunities for post-docs, graduate students, undergraduates, high school students and San Diego State University researchers, with an emphasis on involvement of under-represented minorities. All project participants in San Diego including post-docs, graduate students and undergraduates are receiving unique, interdisciplinary training made possible by the collaboration this project involves between investigators with expertise in mass spectrometry, bioinformatics, maize developmental and cell biology, and plant responses to stress. High school students are participating in the research via a module developed for BioBridge, a UC San Diego outreach program that brings hands-on learning activities into San Diego public schools. Researchers at San Diego State University will receive training and education in proteomics and bioinformatics through workshops. Access to the biological materials used in the project is provided by the Germplasm Resources Information Network (GRIN, www.ars-grin.gov/). Access to the project results, including data and software, is provided by websites and publications by the investigators (http://briggs.ucsd.edu/; http://www-cse.ucsd.edu/~vbafna/). The long-term repository for project data is Tranche (https://proteomecommons.org/index.jsp) and Gramene (www.gramene.org).

Project Report

This project involved two major research and education activities. The first was to create an atlas of maize proteotypes that identifies as many proteins as possible and reports their relative abundance and levels of phosphorylation across 33 organs and stages of development; sub-cellular fractions (chloroplasts, mitochondria, glyoxysomes, and plasma membrane) also were characterized. The second activity was to discover new protein-coding genes and to correct existing gene models by mapping peptides to an expanded translation of the genome. We have completed the generation of data. Eleven papers have been published. Three additional manuscripts are being written. We have begun the process of transferring our data to MaizeGDB to ensure long-term access by the community. Several scholars have been using our results to accelerate their own research progress. We identified 172,529 distinct peptides derived from 39,568 proteins which map to 23,988 genes; included are 32,183 distinct phosphopeptides. The proteotypes of the green leaf, pollen, and embryo are distinct from each other and from most non-photosynthetic tissues, which are similar to each other (endosperm, pericarp/aleurone, ear, silk, female spikelet, tassel, anther, immature leaf, root). The distribution of phosphorylation is pSer (80%), pThr (18%), and pTyr (2%). Comparison of our seed data to published transcriptome data revealed that mRNA and protein levels are poorly correlated. We found that many of the most abundant mRNAs are not associated with detectable levels of protein, implying that many messages are under translational control or that the proteins are immediately degraded. Even more interesting was the finding that many of the most abundant proteins are not associated with detectable levels of mRNA. We chose several of these cases and confirmed the accuracy of the published mRNA data using RT-PCR with the same samples from which we had extracted proteins. We have evidence for three different mechanisms to explain these cases: some are due to long-lived proteins that were made from message that was only present earlier in development; some of the proteins appear to have been transported in from other tissues; and some are the result of mRNAs cycling between high levels at night and low in the day, when samples were taken. A distinguishing feature of our data is that every sample is represented by 4 or more biological replicates, which enables quantitative analysis. Two key discoveries resulted from clustering and network analysis of our seed data. Clustering revealed that transcription factor families tend to be co-expressed. For example, the 48 observed bZIP proteins were mostly expressed in the mature endosperm whereas 27 PHD proteins were mostly expressed in the immature embryo. Family patterning of transcription factor expression is a surprise but it holds true for most of the 618 observed transcription factors comprising 47 families. This discovery suggests that new functions enabled by evolutionary radiation of transcription factor gene families are constrained to utilities within their original proteotype. The second discovery resulted from network analysis used to infer protein kinase substrates from patterns of protein phosphorylation. Phosphorylation of the activation loop of protein kinases is causally related to enzyme activity so substrates were inferred by quantitative correlation between activation loop phosphorylation and phosphorylation of all other proteins. A network was constructed inferring 762 substrates of nine protein kinases in the seed. The network was partially validated by observing that eight orthologs of published substrates for MPK6 in arabidopsis were observed in the maize MPK6 network. Two doctoral candidates and three Master’s candidates were trained and graduated in the course of conducting the project. A postdoctoral scholar was trained and successfully transitioned to an independent position at a university. Two additional postdoctoral scholars were trained and are preparing to enter the job market.

Agency
National Science Foundation (NSF)
Institute
Division of Integrative Organismal Systems (IOS)
Application #
0924023
Program Officer
Diane Jofuku Okamuro
Project Start
Project End
Budget Start
2009-09-15
Budget End
2013-08-31
Support Year
Fiscal Year
2009
Total Cost
$3,801,162
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093