This is a proposal to implement a system that encompasses data annotation, data sharing and advanced analysis of established and new data sets by the research community. The data sets are all produced from NIH supported project, R01 ES028263, that investigate the mechanistic aspects of arsenic carcinogenicity. An underlying strength of this research program is the production of global data sets in the same model system for gene expression, transcription factor activity, protein abundance and metabolite abundance. The goal of this project is to enable intuitive retrieval of the data sets and exact reproduction of the analysis workflows and pipeline. A new team of informaticians, data scientists and computational scientists with programing expertise has been brought together to achieve the aims of this proposal. The system will have two components: 1st, a data catalog that will serve as a gateway that will facilitate the finding of raw and intermediate data files including non-omics data like western blots and flow cytometry and 2nd, a Galaxy server that will support data reanalysis as well as data set integration. We will implement an experimental meta-data annotation system similar to the NCBI GEO report system. Using that meta-data our data catalog will allow a user to filter data files based on variables like sample type, molecular analyte or treatment and provide links to the selected files that will be stored in a mixed system of local storage and national repositories. A Galaxy server will be implemented that will allow a user to retrieve data via the data gateway and reprocess it using workflows we provide. This project will meet the goals for FAIR (Findable, Accessible, Interoperable and Reusable) data sharing and importantly will meet them while integrating the analysis of multiple data types.

Public Health Relevance

This Supplement Program application is aimed at enabling intuitive retrieval of the collected ?omics data sets and exacting reproduction of the analysis workflows and pipeline through establishing a platform that can be shared by the research community. The collected ?omics data include metabolomics, proteomics, ChIP-seq, RNA-seq, from the arsenic (As3+)-induced cancer stem-like cells (CSCs), CRISPR-Cas9-based mdig gene knockout lung cancer and breast cancer cells. In addition, DNA methylation-seq, RNA 5mC and m6A-seq, etc., will be available and integrated into the platform soon.

Agency
National Institute of Health (NIH)
Institute
National Institute of Environmental Health Sciences (NIEHS)
Type
Research Project (R01)
Project #
3R01ES028263-03S1
Application #
9860050
Study Section
Systemic Injury by Environmental Exposure (SIEE)
Program Officer
Shaughnessy, Daniel
Project Start
2017-08-15
Project End
2022-06-30
Budget Start
2019-08-01
Budget End
2020-06-30
Support Year
3
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Wayne State University
Department
Pharmacology
Type
Schools of Pharmacy
DUNS #
001962224
City
Detroit
State
MI
Country
United States
Zip Code
48202
Thakur, Chitra; Chen, Bailing; Li, Lingzhi et al. (2018) Loss of mdig expression enhances DNA and histone methylation and metastasis of aggressive breast cancer. Signal Transduct Target Ther 3:25