Intelligent deployment of containerized bioinformatics workflows on the cloud

Yeung-Rhee, Ka

Abstract

Cloud computing has emerged as a promising solution to address the challenges of big data. Public cloud vendors provide computing as-a-utility enabling users to pay only for the resources that are actually used. In this application, we will develop methods and tools to enable biomedical researchers to optimize the costs of cloud computing when analyzing biomedical big data. Infrastructure-as-a-Service (IaaS) cloud provides computing as a utility, on-demand, to end users, enabling cloud resources to be rapidly provisioned and scaled to meet computational and performance requirements. In addition, dynamic intelligent allocation of cloud computing resources has great potential to both improve performance and reduce hosting costs. Unfortunately, determining the most cost-effective and efficient ways to deploy modules on the cloud is non- trivial, due to a plethora of cloud vendors, each providing different types of virtual machines with different capabilities, performance trade-offs, and pricing structures. In addition, modern bioinformatics workflows consist of multiple modules, applications and libraries, each with their own set of software dependencies. Software containers package binary executables and scripts into modules with their software dependencies. With containers that compartmentalize software dependencies, modules implemented as containers can be mixed and matched to create workflows that give identical results on any platform. The high degree of reproducibility and flexibility of software containers makes them ideal instruments for disseminating complex bioinformatics workflows. Our overarching goal is to deliver the latest technological advances in containers and cloud computing to a typical biomedical researcher with limited resources who works with big data. Specifically, we will develop a user-friendly drag-and-drop interface to enable biomedical researchers to build and edit containerized workflows. Most importantly, users can choose to deploy and scale selected modules in the workflow on cloud computing platforms in a transparent, yet guided fashion, to optimize cost and performance.
Our aim i s to provide a federated approach that leverages resources from multiple cloud vendors. We have assembled a team of interdisciplinary scientists with expertise in bioinformatics, cloud and distributed computing, and machine learning. As part of this application, we will work closely with end users who routinely generate and analyze RNA-seq data. We will illustrate how our containerized, cloud-enabled methods and tools will benefit bioinformatics analyses.

Public Health Relevance

Cloud computing has emerged as a promising solution to address the challenge of analyzing diverse and massive data generated to advance our understanding of health and diseases. We will develop methods and tools to build and intelligently deploy modular and cloud-enabled bioinformatics workflows. These tools will allow the biomedical community to optimize the costs associated with cloud computing and to facilitate the replication of scientific results.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM126019-02
Application #: 9625823
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Ravichandran, Veerasamy

Project Start: 2018-02-01
Project End: 2021-01-31
Budget Start: 2019-02-01
Budget End: 2020-01-31
Support Year: 2
Fiscal Year: 2019
Total Cost
Indirect Cost

Institution

Name: University of Washington
Department
Type: Organized Research Units
DUNS #: 605799469

City: Seattle
State: WA
Country: United States
Zip Code: 98195

Related projects


NIH 2020 R01 GM	Intelligent deployment of containerized bioinformatics workflows on the cloud Yeung-Rhee, Ka Yee / University of Washington
NIH 2019 R01 GM	Intelligent deployment of containerized bioinformatics workflows on the cloud Yeung-Rhee, Ka Yee / University of Washington
NIH 2019 R01 GM	Intelligent deployment of containerized bioinformatics workflows on the cloud Yeung-Rhee, Ka Yee / University of Washington
NIH 2019 R01 GM	Intelligent deployment of containerized bioinformatics workflows on the cloud Yeung-Rhee, Ka Yee / University of Washington
NIH 2018 R01 GM	Intelligent deployment of containerized bioinformatics workflows on the cloud Yeung-Rhee, Ka Yee / University of Washington

Publications

Zhang, Pai; Hung, Ling-Hong; Lloyd, Wes et al. (2018) Hot-starting software containers for STAR aligner. Gigascience 7:

Comments

Be the first to comment on Ka Yeung-Rhee's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: