Cloud computing has emerged as a promising solution to address the challenges of big data. Public cloud vendors provide computing as-a-utility enabling users to pay only for the resources that are actually used. In this application, we will develop methods and tools to enable biomedical researchers to optimize the costs of cloud computing when analyzing biomedical big data. Infrastructure-as-a-Service (IaaS) cloud provides computing as a utility, on-demand, to end users, enabling cloud resources to be rapidly provisioned and scaled to meet computational and performance requirements. In addition, dynamic intelligent allocation of cloud computing resources has great potential to both improve performance and reduce hosting costs. Unfortunately, determining the most cost-effective and efficient ways to deploy modules on the cloud is non- trivial, due to a plethora of cloud vendors, each providing different types of virtual machines with different capabilities, performance trade-offs, and pricing structures. In addition, modern bioinformatics workflows consist of multiple modules, applications and libraries, each with their own set of software dependencies. Software containers package binary executables and scripts into modules with their software dependencies. With containers that compartmentalize software dependencies, modules implemented as containers can be mixed and matched to create workflows that give identical results on any platform. The high degree of reproducibility and flexibility of software containers makes them ideal instruments for disseminating complex bioinformatics workflows. Our overarching goal is to deliver the latest technological advances in containers and cloud computing to a typical biomedical researcher with limited resources who works with big data. Specifically, we will develop a user-friendly drag-and-drop interface to enable biomedical researchers to build and edit containerized workflows. Most importantly, users can choose to deploy and scale selected modules in the workflow on cloud computing platforms in a transparent, yet guided fashion, to optimize cost and performance.
Our aim i s to provide a federated approach that leverages resources from multiple cloud vendors. We have assembled a team of interdisciplinary scientists with expertise in bioinformatics, cloud and distributed computing, and machine learning. As part of this application, we will work closely with end users who routinely generate and analyze RNA-seq data. We will illustrate how our containerized, cloud-enabled methods and tools will benefit bioinformatics analyses.

Public Health Relevance

Cloud computing has emerged as a promising solution to address the challenge of analyzing diverse and massive data generated to advance our understanding of health and diseases. We will develop methods and tools to build and intelligently deploy modular and cloud-enabled bioinformatics workflows. These tools will allow the biomedical community to optimize the costs associated with cloud computing and to facilitate the replication of scientific results.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM126019-01
Application #
9422475
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2018-02-01
Project End
2021-01-31
Budget Start
2018-02-01
Budget End
2019-01-31
Support Year
1
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of Washington
Department
Type
Organized Research Units
DUNS #
605799469
City
Seattle
State
WA
Country
United States
Zip Code
98195
Zhang, Pai; Hung, Ling-Hong; Lloyd, Wes et al. (2018) Hot-starting software containers for STAR aligner. Gigascience 7: