Cancer genomics resources are growing at an unprecedented pace. However, a comprehensive analysis of the cancer genome still remains a daunting challenge. This is in part due to the difficulties in visualizing, integrating, and analyzng cancer genomics data with current technologies. We propose to develop a cloud-based platform to empower researchers with the ability to host, visualize and analyze their own data. The platform is composed of a set of Cancer Analytics Virtual Machines (CAVMs). The main component of each CAVM is a data server which functions to store and serve user data to applications, such as the UCSC Cancer Genomics Browser, to provide data visualization. The second component is a modified Galaxy workflow system to provide data analysis capability. UCSC's suite of analysis tools for nextgen sequencing data analysis and pathway inference will be prepackaged with the system. The two components will be highly integrated to allow tightly coupled cycles of data visualization and analysis. The data server component will be modular such that it can provide data independently to applications besides the Cancer Browser and Galaxy. We will deliver virtual machine images that can be easily initiated in a commercial cloud such as Amazon, or can be installed within a user's own institution. The CAVM also functions as a way for users to Integrate with external large-scale databases. We will deliver a UCSC CAVM that other CAVM instances can connect to, to provide authorized data access from the UCSC cancer genomics data repository. The system allows the dynamic formation of new datasets composed of data slices from multiple sources. This ability to combine data into larger samples will provide the statistical power to allow discoveries that would otherwise not be possible.
We aim to eliminate, or significantly reduce, the overhead of system configuration and software installation. Our tools will provide users the capability to access a cloud-based cluster computing environment, which will make sophisticated, computationally intensive analyses accessible to researchers who might not, have access to compute servers. The software platform we develop can be used by individual bench biologists, and also by large projects to serve data to individual users or to other projects. This design has the potential to form an expansive federated database accessible through the same software interface.

Public Health Relevance

Currently, clinicians and bench biologists typically depend on external collaborators for data analysis. The proposed system will provide these scientists with data analysis and visualization methods that are both powerful and easy to use. This will accelerate research in the understanding and treatment of cancer, the second-leading cause of death in the U.S.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Santa Cruz
Engineering (All Types)
Biomed Engr/Col Engr/Engr Sta
Santa Cruz
United States
Zip Code
Cancer Genome Atlas Research Network; Linehan, W Marston; Spellman, Paul T et al. (2016) Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma. N Engl J Med 374:135-45
Zheng, Siyuan; Cherniack, Andrew D; Dewal, Ninad et al. (2016) Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma. Cancer Cell 29:723-36
Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R et al. (2016) The UCSC Genome Browser database: 2016 update. Nucleic Acids Res 44:D717-25
Blau, C Anthony; Ramirez, Arturo B; Blau, Sibel et al. (2016) A Distributed Network for Intensive Longitudinal Monitoring in Metastatic Triple-Negative Breast Cancer. J Natl Compr Canc Netw 14:8-17
Ceccarelli, Michele; Barthel, Floris P; Malta, Tathiane M et al. (2016) Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma. Cell 164:550-63
Cancer Genome Atlas Network (2015) Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 517:576-82
Cancer Genome Atlas Network (2015) Genomic Classification of Cutaneous Melanoma. Cell 161:1681-96
(2015) Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas. N Engl J Med 372:2481-98
Rosenbloom, Kate R; Armstrong, Joel; Barber, Galt P et al. (2015) The UCSC Genome Browser database: 2015 update. Nucleic Acids Res 43:D670-81
Goldman, Mary; Craft, Brian; Swatloski, Teresa et al. (2015) The UCSC Cancer Genomics Browser: update 2015. Nucleic Acids Res 43:D812-7

Showing the most recent 10 out of 18 publications