The recent proliferation of next-generation sequencing (NGS) - based methods for the analysis of gene expres- sion, chromatin structure and protein-DNA interactions has created tremendous opportunities for gaining novel insights into basic biology, health, and disease. However, analysis of the resulting data requires computational expertise that many traditional biologists do not possess. Hence, when dealing with genomics data, majority of biologists require the help of bioinformaticians even for simple tasks. This places these exciting methods beyond the reach of the majority of life scientists. This proposal from DATIRIUM, LLC, a start-up company from Cincinnati, OH describes a plan to create SciDAP (Scientific Data Analysis Platform), a novel multi-omics user-friendly data analysis platform to allow biologists to analyze the data themselves and to enable collaboration between biologists and bioinformaticians. Datirium was founded by this application?s PIs, Artem Barski, PhD and Andrey Kartashov, initially to assist users with installation and support of BioWardrobe. BioWardrobe, a user-friendly open-source integrative genomics analysis platform, was developed by the Barski laboratory at Cincinnati Children?s Hospital Medical Center (CCHMC) in 2015. It has been used by more than 40 CCHMC laboratories to process more than 8000 experi- ments and has been applied in more than 40 publications. In addition, Datirium has installed and continues to maintain BioWardrobe servers at several external research centers. For Datirium, BioWardrobe served as a Minimum Viable Product (MVP) that allowed to confirm the need and existence of market niche for such software, but also highlighted several design limitations. The key among them was the difficulty in adding new or modifying existing pipelines: due to the tight coupling between pipeline and user interface this required changes at all levels of software. Unfortunately, the same limitation exists for all user-friendly bioinformatics tools. Given that there are more than 150 NGS-based methods and many ways to process the data, this explains why a universal and user-friendly data analysis platform does not yet exist. We hypothesize that we can create a data analysis platform that is both universal and user-friendly by includ- ing interface instructions into computational pipelines itself. Platform will use these instructions to create a graphical interface for users. Specifically, we are using containerized pipelines developed using Common Work- flow Language (CWL). CWL allows to describe tools, pipelines and computational environment making these pipelines both portable and reproducible. On top of CWL, Datirium developed a system of CWL extensions that allows to describe the inputs and outputs visualizations within the CWL workflows. Importantly, our platform will increase the rigor of computational analysis by (i) making the analysis reproducible and auditable by bioin- formaticians due to CWL pipeline portability and recording each step of the analysis as Research Objects; (ii) enabling collaboration between experimentalists and computational biologists by providing bioinformaticians with a way to direct analysis flow and biologists with the convenience of GUI; (iii) Including out of the box pipe- lines with optimized parameters and actionable QC metrics that flag possible issues. In the first aim of this proposal we will create a prototype of the platform. In the second aim, we will conduct usability testing with bioinformaticians and experimentalists to test whether our platform can accommodate diverse types of the analysis in a user-friendly fashion. Specifically, in collaboration with Dr. Salomonis at CCHMC, we will test how easy it is to integrate two existing analysis routines into SciDAP: BS-Seq DNA-methyl- ation and scRNA-Seq; then we will work with biologists to ensure that the resulting interface is both user-friendly and enables collaboration with bioinformaticians. Successful completion of this project will provide the research community with a cutting edge, flexible and biologist-friendly data analysis platform.

Public Health Relevance

Conventional biomedical scientists require user-friendly bioinformatics tools to accelerate their omics re- search. We will create a powerful and flexible data analysis platform enable experimentalist and computational biologists to collaborate and analyze, visualize and integrate diverse genomics datasets, including single cell RNA-Seq and DNA methylation.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Technology Transfer (STTR) Grants - Phase I (R41)
Project #
1R41HG011219-01A1
Application #
10081764
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Sofia, Heidi J
Project Start
2020-09-14
Project End
2021-08-31
Budget Start
2020-09-14
Budget End
2021-08-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Datirium, LLC
Department
Type
DUNS #
064815424
City
Cincinnati
State
OH
Country
United States
Zip Code
45226