PI: Jay J. Thelen, Department of Biochemistry, University of Missouri-Columbia Collaborator: Dong Xu, Computer Science Department, University of Missouri Columbia

Plant seeds are important, renewable sources of natural products such as oil, protein, starch and fiber. Though the biosynthetic pathways for these storage compounds are mostly known, it is not clear how these pathways are regulated in oilseeds which produce higher quantities of oil and protein. Protein phosphorylation is an important regulatory mechanism for many metabolic enzymes and preliminary data reveal hundreds of phosphoproteins in developing oilseed rape and soybean. Identification and quantification of phosphoprotein control mechanisms in developing seed of three different oilseeds will provide the basis for understanding the regulatory networks involved in oilseed development and will help in designing new strategies for crop improvement. This proposal aims to unravel networks of phosphoproteins and protein kinases primarily involved in the regulation of metabolic processes during the oil and protein accumulation phase of seed development in three different oilseed plants, oilseed rape, soybean, and Arabidopsis thaliana. In addition to this fundamental scientific objective, the broader impacts of this proposal include: (1) organizing proteomics-themed workshops and developing web-based proteomics tutorials targeting High School teachers and students, respectively; (2) training plant scientists at the undergraduate through post-doctoral levels in the emerging discipline of proteomics; (3) recruiting under-represented students through University-sponsored undergraduate research programs and fellowships; (4) developing techniques for phosphopeptide identification and quantification using the emerging technology of absolute quantification (AQUA) peptides; (5) generating synthetic AQUA peptide tools for the plant protein phosphorylation community; and (6) public dissemination of phosphoproteomics data in multiple, user-intuitive formats to provide new directions for the scientific community to modify seed metabolism for the benefit of health and environment. For website see http://oilseedproteomics.missouri.edu

Project Report

Funding from this grant resulted in over 50 original research articles and trained over 30 student and professional scientists. It supported the development of plant proteomic mini-symposiums and high school teacher workshops/labs aimed at educating the educators on state-of-the-art protein analysis methodologies. The many scientific outcomes are summarized herein in four different sections pertaining to the aims of this project. 1)Protein phosphorylation is a fundamental mechanism for post-translational regulation of proteins in all organisms. Plants particularly appear to have purloined this mechanism of post-translation control as they contain a higher frequency of protein kinases in their genomes than any other life forms. Development of approaches for interrogating the phosphoproteome (all phosphoproteins in a cell) en masse has resulted in an ongoing, systematic query of plant phosphoproteins in the scientific community. This investigation contributed to this enterprise by providing the first phosphoproteome catalog of developing seed. We identified thousands of high quality phosphopeptides and phosphorylation sites from canola, soybean, and arabidopsis developing seed. The majority of these sites are new, in comparison to other plant phosphoproteomic investigations. This suggests novel signaling pathways in seeds. We are further exploring some of these new phosphoproteins including for example acetyl-CoA carboxylase, the committed step for de novo fatty acid synthesis. 2) In total, the scientific community has mapped over 30,000 protein phosphorlyation events from different plant systems, both models and crops. With this increasing diversity of experimentally mapped phosphorylation sites, one of the problems the plant scientific community faces is cataloging and organizing these phosphorylation events so they can be readily accessed by the community. To address this problem, we developed The Plant Protein Phosphorylation Database or P3DB available at p3db.org. This database is a comprehensive compendium of high-quality protein phosphorylation events in any plant system. There are multiple, user-intuitive ways to mine this database of phosphorylation sites including various keyword search tools and primary sequence similarity using an embedded BLAST feature. Future plant phosphoproteomic datasets can be submitted directly by users through a series of steps and tutorials that explain format. We are presently on version 3.0 for this web database and the ease of use is reflected by its widespread adoption by the plant (and non-plant) community. 3) Mapping all plant protein phosphorylation events using exclusively experimental approaches is problematic due to biases associated with bottom-up proteomic methodologies and sensitivity limitations of current mass spectrometry instrumentation. Nevertheless the number of mapped phosphorylation events represents a reservoir of information that can be used to devise better protein phosphorylation prediction algorithms. We leveraged these data to produce a new phosphorylation prediction algorithm called MUSite. This program employs three different feature extraction tools including disorder analysis to make this the most reliable, general prediction algorithm for plants, if not, all eukaryotes. It was benchmarked against three other leading general prediction algorithms and outperformed each of them by receiver operator curve analysis. Version 2.0 was recently finished and the bioinformatic community is actively building on the advances this program represents. 4) Lastly, we developed a novel screening approach to interrogate protein kinases for their client substrates. The Kinase Client assay or KiC assay utilizes synthetic peptides (in solution) and quantitative mass spectrometry to screen kinase clients en masse. We initially validated this approach using both the pyruvate dehydrogenase kinase and a calcium dependent protein kinase, and expanded this approach to screen for clients using a 2,100 synthetic peptide library of in vivo phosphopeptides from various Arabidopsis phosphoproteomic studies. This library has been screened with over 70 different protein kinases, both internally and for various external collaborators. The data suggest distinct substrate specificity even among closely related members within protein kinase sub-families. This may offer opportunities for biochemical discovery of kinase function particulary in instances where molecular genetic approaches fail to reveal a phenotype.

National Science Foundation (NSF)
Division of Integrative Organismal Systems (IOS)
Application #
Program Officer
Diane Jofuku Okamuro
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Missouri-Columbia
United States
Zip Code