This is a proposal to implement a system that encompasses data annotation, data sharing and advanced analysis of established and new data sets by the research community. The data sets are all produced from NIH supported project, R01 ES028263, that investigate the mechanistic aspects of arsenic carcinogenicity. An underlying strength of this research program is the production of global data sets in the same model system for gene expression, transcription factor activity, protein abundance and metabolite abundance. The goal of this project is to enable intuitive retrieval of the data sets and exact reproduction of the analysis workflows and pipeline. A new team of informaticians, data scientists and computational scientists with programing expertise has been brought together to achieve the aims of this proposal. The system will have two components: 1st, a data catalog that will serve as a gateway that will facilitate the finding of raw and intermediate data files including non-omics data like western blots and flow cytometry and 2nd, a Galaxy server that will support data reanalysis as well as data set integration. We will implement an experimental meta-data annotation system similar to the NCBI GEO report system. Using that meta-data our data catalog will allow a user to filter data files based on variables like sample type, molecular analyte or treatment and provide links to the selected files that will be stored in a mixed system of local storage and national repositories. A Galaxy server will be implemented that will allow a user to retrieve data via the data gateway and reprocess it using workflows we provide. This project will meet the goals for FAIR (Findable, Accessible, Interoperable and Reusable) data sharing and importantly will meet them while integrating the analysis of multiple data types.
This Supplement Program application is aimed at enabling intuitive retrieval of the collected ?omics data sets and exacting reproduction of the analysis workflows and pipeline through establishing a platform that can be shared by the research community. The collected ?omics data include metabolomics, proteomics, ChIP-seq, RNA-seq, from the arsenic (As3+)-induced cancer stem-like cells (CSCs), CRISPR-Cas9-based mdig gene knockout lung cancer and breast cancer cells. In addition, DNA methylation-seq, RNA 5mC and m6A-seq, etc., will be available and integrated into the platform soon.
Thakur, Chitra; Chen, Bailing; Li, Lingzhi et al. (2018) Loss of mdig expression enhances DNA and histone methylation and metastasis of aggressive breast cancer. Signal Transduct Target Ther 3:25 |