A suite of functioning scientific use cases is a critical need for the development of the cloud commons. Use cases can flesh out requirements, identify gaps and limitations of current infrastructure, and inspire further investment by developers and funders. However, absent significant pre-existing infrastructure, it can be challenging to implement a Commons use case that truly exercises infrastructure and can point the way to future development - this is the chicken-and-egg problem that the Phase I Pilot is designed to tackle. In this proposal, we identify a common biological use case (large-scale analysis of sequencing data) and propose tcombining existing pieces of technology to implement a solution for Stage I of the Pilot Phase. All of our technology already exists, functions, is public, and is open source. Moreover, our team embodies both scientific expertise and software engineering expertise in the same people, uses modern collaboration technologies, has a strong history in open science and open source, and is dedicated to a future of highly reproducible and repeatable biomedical data analysis. In particular, our team has considerable practical experience in cloud computing, multicloud execution, open source software engineering, and reproducible research. The outcome of our proposal will be a functional (if minimal) solution that can be used by bioinformaticians to execute large scale expression (Salmon/CoGAPS) and structural variant (VariationHunter) analysis. Biologists will be able to explore the results interactively using the already-extant but soon well-integrated ADAGE server. We also intend to use our significant expertise in training, documentation, and open source community development to develop and deliver training materials to developers, both around our own experiences but more importantly that of the Phase I Consortium. The primary goal of this is to coalesce an informed and engaged developer community around the Commons that includes not only Consortium members but members of the broader bioinformatics community. This training will serve as one of several ways to integrate biomedical data science community feedback into the Commons.

Agency
National Institute of Health (NIH)
Institute
Office of The Director, National Institutes of Health (OD)
Project #
3OT3OD025465-01S1
Application #
9672004
Study Section
Data Coordination, Mapping, and Modeling (DCMM)
Program Officer
Kutkat, Lora
Project Start
2017-09-30
Project End
2019-03-31
Budget Start
2017-09-30
Budget End
2019-03-31
Support Year
1
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of California Davis
Department
Veterinary Sciences
Type
Schools of Veterinary Medicine
DUNS #
047120084
City
Davis
State
CA
Country
United States
Zip Code
95618