A major limitation of plant research is the accurate identification and quantification of proteins from tissues and organisms. Mass spectrometry is a technology that allows scientists to identify thousands of proteins from a tissue or organism. Together, the collective set of all proteins is called the "proteome". This project will generate an atlas of identified proteins in the model plant species Arabidopsis thaliana, commonly named "thale cress", using publicly available protein mass spectrometry studies from laboratories around the world. Such an atlas will then allow the researchers to address a number of important biological questions about this plant proteome. Once the first protein atlas has been generated for Arabidopsis, the researchers will also generate an atlas for two crop species, such as maize, rice, or soybean. This project will also educate and train young scientists in techniques for processing and managing mass spectrometry data for understanding the proteome of plants.
The availability of a high quality annotated genome and a wealth of publicly available mass-spectrometry (MS) based proteomics data makes Arabidopsis thaliana an ideal plant for mapping its proteome and improving genome annotation. The research team will integrate this information with other systems data to address a range of outstanding questions and hypotheses. Taking advantage of the Human PeptideAtlas proteome infrastructure, by co-PI Deutsch and colleagues at ISB, this project will generate the first Arabidopsis PeptideAtlas based on hundreds of existing MS studies from laboratories around the world, collected through ProteomeXchange (www.proteomexchange.org/), and reanalyzed through a uniform processing pipeline. All matched MS-derived peptide data will be projected onto predicted primary protein sequences with links to spectral, technical and biological metadata. This PeptideAtlas and its metadata will serve to address several biological questions and hypotheses, such as which mRNA isoforms are supported by proteomics evidence and is there evidence for intra-protein cross-talk or competition between PTMs. The Arabidopsis PeptideAtlas will be linked to TAIR (www.arabidopsis.org/) and other web-based resources. Complete dataset will be publicly available for download, thus further facilitating data mining by the scientific community and amplifying the impact of this project. Once the first few builds are released for Arabidopsis, and following community feedback/interactions, PeptideAtlas building will be extended to other plant species of societal and/or economic importance for which extensive and comprehensive MS-based proteomics data are available (e.g. maize, tomato, rice, soybean).
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.