Many initiatives have been launched to ensure that metabolomics data becomes publicly accessible. Despite the growing availability the data is not being reused. One of the main limitations of metabolomics data reuse and cross-comparisons is the lack of a unifying format and methods that enable comparison of multiple data sets, even collected on different instruments and methods as it is done with UniFrac for microbial sequencing. UniFrac is a distance relationship metric that takes in account phylogenetic relationships. Our goal with this project is threefold. 1) convert all public data into a unifying format. 2) subject all data with MS/MS information to living data in GNPS (http://gnps.ucsd.edu) where knowledge about the chemistry associated with the data is automatically updated and relayed to subscribers to the data. 3) create ChemiFrac, the Unifrac equivalent for metabolomics. Here we will use molecular networking as our phylogenetic relationship measure thus enabling global comparisons of data sets, that we expect will even work when different extractions and instruments are used.
Metabolomics is widely used in clinical and fundamental biology research. Public metabolomics data is stored but not yet reused. Here we develop strategies to enable unification of the data format and develop strategies to make cross comparisons of the data, key steps for reusing data in the public domain.
Scheubert, Kerstin; Hufsky, Franziska; Petras, Daniel et al. (2017) Significance estimation for large scale metabolomics annotations by spectral matching. Nat Commun 8:1494 |
Garg, Neha; Wang, Mingxun; Hyde, Embriette et al. (2017) Three-Dimensional Microbiome and Metabolome Cartography of a Diseased Human Lung. Cell Host Microbe 22:705-716.e4 |
Hartmann, Aaron C; Petras, Daniel; Quinn, Robert A et al. (2017) Meta-mass shift chemical profiling of metabolomes from coral reefs. Proc Natl Acad Sci U S A 114:11685-11690 |