Cancer research is now a data-driven discipline, but only a minority of cancer researchers are data scientists. This severely restricts our ability to effectively study and cure the disease. The far reaching significance of our project is in federating disparate data and computational resources in order to provide a unifying analysis platform for computational cancer research. We will extend the popular scientific workbench Galaxy (https://galaxyproject.org) so that it can integrate with distributed data and compute resources used and needed by cancer researchers, including those resources in the NCI Cancer Research Data Commons (NCR DC). Our Federated Galaxy system will allow users to seamlessly access NCR DC data across multiple resources. It will support multiple analysis scenarios tuned to skills and computational requirements of individual researchers.
The aims of this project are:
Aim 1. Extend Galaxy for working with distributed cancer genomics and phenotypic data. This will enable Galaxy users to access both public and private cancer data regardless of their actual physical location. Best-practice approaches will be used for accessing restricted datasets.
Aim 2. Enhance Galaxy for context-aware, distributed cancer genomics analyses using shared workflow representations. This will enable Galaxy users to run genomics analyses on different clouds, ultimately reducing the time, cost, and data transfer associated with analyses.
Aim 3. Apply Federated Galaxy to precision oncology research. Workflows developed in this aim will leverage the technologies in Aims 1 and 2 to benchmark machine learning algorithms for predicting tumor phenotype and drug response. Interactive reports will summarize benchmarking results and utilize ITCR visualizations for deep dives into results. Our system will provide a singular access point to distributed cancer datasets and will enable these data to be analyzed within a single portal in a way that satisfies multiple analysis scenarios and utilizes diverse computational resources. Finally, a cloud-centric Galaxy built for the NCR DC will substantially grow the community of users working with the GDC and the NCR DC. This is because Galaxy brings with itself a vibrant world-wide community of users and developers, which numbers tens of thousands of scientists. These individuals will help to tune the GDC and other resources within the NCR DC to the needs of real-life analysis scenarios and will enrich the set of tools accessible to cancer researchers.

Public Health Relevance

This project will develop a user-friendly scientific analysis workbench for analyzing cancer genomics data on the NCI Cancer Research Data Commons cloud platform. The workbench will democratize access to cloud-based cancer genomic analyses. It will also aid in precision cancer medicine by benchmarking and identifying the most accurate analytic methods for classifying tumors and predicting drug response.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
5U24CA231877-02
Application #
9783755
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Li, Jerry
Project Start
2018-09-11
Project End
2023-08-31
Budget Start
2019-09-01
Budget End
2020-08-31
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Oregon Health and Science University
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
096997515
City
Portland
State
OR
Country
United States
Zip Code
97239