The project will contribute MassIVE.quant, a novel data resource for quantitative mass spectrometry-based proteomics. Quantitative mass spectrometry characterizes proteins in complex biological mixtures with the highest available accuracy, sensitivity and throughput. Analysis of most such experiments involves identification of peptides and proteins that generated the spectra, and relative quantification of changes in abundance between pre-defined conditions. While the identifications workflows are now mature and ready for reproducible research, the quantitative workflows lag very far behind. No repositories can now store the analyses results across all workflows, and it is often impossible for authors to provide their data in a form that allows independent evaluation and reuse. This undermines the reproducibility and the impact of these investigations. The project combines the prior expertise of the Banderia?s lab in developing Mass spectrometry Interactive Virtual Environment (MassIVE), a public repository for storing, documenting and re-analyzing mass spectra for identification, and the prior expertise of the Vitek lab in developing MSstats, a broad-scope collection of statistical methods and software for quantitative proteomic workflows. First, the project will fully document and annotate a medium scale ?training set? of quantitative investigations (which often rely on manual procedures), to develop standards for documenting and annotating the experiments with respect to the biological origins of the samples, and the technological aspects of data acquisition and processing. Second, the project will design functionalities for repository-wide complete and automated re-analyses of the original investigations, using a limited number of ?good practice? workflows. The re-analyses will fully preserve the provenance of the results, and will be used to further characterize potential pitfalls in the experimental designs and conclusions. Finally, the project will place these investigations into a broader scientific context. It will design a query infrastructure that links each experiment to its peer investigations, i.e. investigations with similar biological or technological aspects, to provide insights into consistency of the results. Continuing the extensive prior outreach efforts of the PIs, the results will be disseminated to a broad community of stakeholders, including proteomic scientists, tool developers, journal editors, trainees, and scientists interested in protein-level information. The project will shift the mass spectrometry-based research paradigm, by creating a public resource that currently does not exist in any form. It will expand the technical capabilities of the field, ultimately allowing us to make more accurate of statements about the biological function.

Public Health Relevance

This project will contribute MassIVE.quant, a novel data resource for quantitative mass spectrometry-based proteomics. MassIVE.quant will design a repository structure to fully document quantitative proteomic investigations, implement their complete and automated re-analyses with standardized workflows, and link them to investigations with similar biological or technological aspects. The project will benefit a broad community of researchers, journal editors, and trainees, and will accelerate mass spectrometry-based discovery and methodological research.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM013115-02
Application #
9930148
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Vanbiervliet, Alan
Project Start
2019-06-01
Project End
2023-05-31
Budget Start
2020-06-01
Budget End
2021-05-31
Support Year
2
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of California, San Diego
Department
Pharmacology
Type
Schools of Medicine
DUNS #
804355790
City
La Jolla
State
CA
Country
United States
Zip Code
92093