Improving Deposition Quality and FAIRness of Metabolomics Workbench PROJECT SUMMARY (30 lines) The practical reuse of genomics and transcriptomics datasets is well-demonstrated due to the use of universal gene identifiers that facilitate matching of features across these datasets, high feature coverage, standardized metadata and data deposition formats, and a maturity in deposition quality and consistency. However, metabolomics datasets are much harder to reuse due to the lack of standardization metabolite feature identification, heterogeneity in feature coverage, and high variability in deposition quality and consistency. Therefore, it is much harder to both find relevant metabolomics datasets from repositories like Metabolomics Workbench (MWbench) and effectively reuse these datasets to generate and/or test hypotheses. To address these difficulties in reusing metabolomics datasets, deposition quality must be improved. Furthermore, methods that enable the effective search and harmonization of MWbench studies are needed, especially for integrative multi-omics analyses. We are the developers of the only set of available open- source tools for parsing, generating, and validating mwTab formatted repository files. Our experience developing and utilizing this open-source mwtab Python package makes us uniquely qualified to develop methods to improve both deposition and FAIRness of MWbench studies. Also, we have provided periodic feedback to MWbench based on systematic evaluations of the repository to enable the improvement of this growing public resource (2). Therefore, we propose to develop methods and open-source tools that will improve deposition quality and FAIRness of MWbench through the following specific aims:
Aim 1 : Enable comprehensive capture, deposition, and validation of metabolomics experimental data and metadata;
Aim 2 : Improve the FAIRness of Metabolomics Workbench while demonstrating effective multi-omics integration with the Genotype-Tissue Expression Project (GTEx). The major innovations that this proposal will develop are: i) effective metadata capture methods from unstructured formats, ii) advanced search methods for relevant MWbench studies that can filter on metadata quality, iii) effective harmonization methods for MWbench studies, iv) new omics integration approach to detect human gene-metabolite associations, and v) new tools that facilitate public deposition with high-quality metadata, with InChI tags, and in mwTab format for quicker, easier deposition. The significance of this proposal is in developing methods and tools that: a) comprehensively capture, validate, and deposit metadata-rich metabolomics data, b) improve the FAIRness of MWbench datasets, especially reuse, c) enable integration of MWbench and GTEx datasets to generate biomedically-relevant human gene-metabolite associations, and d) enable interpretation of gene-metabolite associations within molecular interaction networks. These new tools will enhance the utility and usage of Metabolomics Workbench while demonstrating multi-omics integration with the Genotype-Tissue Expression Project.
(3 sentences) We will develop methods and tools needed to effectively archive and use public biomedical datasets describing the thousands of small biomolecules (metabolomics) involved in human biology and disease. We will use these methods to demonstrate the integration and reuse of thousands of datasets in both the Metabolomics Workbench and the Genotype-Tissue Expression Project to derive new biomedical knowledge that describes how gene-products relate to the small molecules generated in metabolism, i.e. the chemical processes of life.