Biomedical research is an important branch of science that deals with the problem of studying biological processes and identifying, preventing and curing diseases. This research forms the pathway to the discovery of new medicines as well as new therapies. As such, biomedical research is crucial to advance the national health and prosperity. Given the geographically distributed research groups and biomedical labs, collaborative science plays a very important role in biomedical research. Galaxy is an open source, web-based framework that is extensively used by more than 20,000 researchers world-wide for conducting research in many application domains, the most prominent of which is biomedical research. It provides a web-based environment using which scientists perform various computational analyses on their data, exchange results from these analyses, explore new research concepts, facilitate student training, and preserve their results for future use. Galaxy currently runs on a large variety of high-performance computing (HPC) platforms including local clusters, supercomputers in national labs, public datacenters and Cloud. Unfortunately, while most of these systems supplement conventional CPUs with significant accelerator capabilities (in the form of Graphical Processing Units (GPUs) and/or Field-Programmable Gate Arrays (FPGAs)), the current Galaxy implementation does not take advantage of these powerful accelerators. This project enhances the Galaxy framework so that it can take full advantage of the tremendous computational capabilities offered by GPUs and FPGAs. By doing so, the important applications running under Galaxy experiences significant speedups, thereby accelerating scientific discoveries.
This project consists of four complementary tasks, which follow a logistic progression as follows: Task-I focuses on redesigning existing Galaxy tools with GPU/FPGA support and integrate them to Galaxy tool-chains; Task-II provides containerization support for the tools and accelerator-aware orchestration for running Galaxy on cloud platforms; Task-III implements specific policy driven scheduling schemes for Task-I and Task-II; and finally, Task-IV redesigns Galaxy storage to speed up execution and reduce bottlenecks related to data transfer. The proposed enhancements to Galaxy enables the integration of innovation with discovery by providing a state-of-the art experimental platform to a larger community of researchers across several disciplines. On the broader impact and outreach/educational front, this project impacts the performance and energy efficiency of Galaxy tools and applications and improves the productivity of a typical Galaxy user tremendously; that is, the main beneficiaries of this project are thousands of members of existing Galaxy Community. However, this project also (i) helps existing GPU and FPGA based (non-Galaxy) applications start using Galaxy, thereby taking full advantage of all existing toolsets within the framework, (ii) enables Galaxy tools to take better advantage of emerging cluster scheduling capabilities, and (iii) creates a synergy with concurrent Galaxy related efforts and existing infrastructure efforts the PIs are involved with, to further expedite scientific discoveries. As such, this proposed system support will have a broad societal impact via the enhanced Galaxy system support. On the education side, the project involves under-represented groups in computer science as well as in bio-informatics, outreach to undergraduates, various K-12 related activities (Science-U, CSATS, VIEW), and engagement with researchers in other disciplines (e.g., natural language processing, image processing, drug discovery and cosmology) via a workshop open to the Galaxy community.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.