Cancer genomics resources are growing at an unprecedented pace. However, a comprehensive analysis of the cancer genome still remains a daunting challenge. This is in part due to the difficulties in visualizing, integrating, and analyzng cancer genomics data with current technologies. We propose to develop a cloud-based platform to empower researchers with the ability to host, visualize and analyze their own data. The platform is composed of a set of Cancer Analytics Virtual Machines (CAVMs). The main component of each CAVM is a data server which functions to store and serve user data to applications, such as the UCSC Cancer Genomics Browser, to provide data visualization. The second component is a modified Galaxy workflow system to provide data analysis capability. UCSC's suite of analysis tools for nextgen sequencing data analysis and pathway inference will be prepackaged with the system. The two components will be highly integrated to allow tightly coupled cycles of data visualization and analysis. The data server component will be modular such that it can provide data independently to applications besides the Cancer Browser and Galaxy. We will deliver virtual machine images that can be easily initiated in a commercial cloud such as Amazon, or can be installed within a user's own institution. The CAVM also functions as a way for users to Integrate with external large-scale databases. We will deliver a UCSC CAVM that other CAVM instances can connect to, to provide authorized data access from the UCSC cancer genomics data repository. The system allows the dynamic formation of new datasets composed of data slices from multiple sources. This ability to combine data into larger samples will provide the statistical power to allow discoveries that would otherwise not be possible.
We aim to eliminate, or significantly reduce, the overhead of system configuration and software installation. Our tools will provide users the capability to access a cloud-based cluster computing environment, which will make sophisticated, computationally intensive analyses accessible to researchers who might not, have access to compute servers. The software platform we develop can be used by individual bench biologists, and also by large projects to serve data to individual users or to other projects. This design has the potential to form an expansive federated database accessible through the same software interface.

Public Health Relevance

Currently, clinicians and bench biologists typically depend on external collaborators for data analysis. The proposed system will provide these scientists with data analysis and visualization methods that are both powerful and easy to use. This will accelerate research in the understanding and treatment of cancer, the second-leading cause of death in the U.S.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Santa Cruz
Engineering (All Types)
Biomed Engr/Col Engr/Engr Sta
Santa Cruz
United States
Zip Code
Cherniack, Andrew D; Shen, Hui; Walter, Vonn et al. (2017) Integrated Molecular Characterization of Uterine Carcinosarcoma. Cancer Cell 31:411-423
Cancer Genome Atlas Research Network; Albert Einstein College of Medicine; Analytical Biological Services et al. (2017) Integrated genomic and molecular characterization of cervical cancer. Nature 543:378-384
Vivian, John; Rao, Arjun Arkal; Nothaft, Frank Austin et al. (2017) Toil enables reproducible, open source, big biomedical data analyses. Nat Biotechnol 35:314-316
Fishbein, Lauren; Leshchiner, Ignaty; Walter, Vonn et al. (2017) Comprehensive Molecular Characterization of Pheochromocytoma and Paraganglioma. Cancer Cell 31:181-193
Cancer Genome Atlas Research Network; Linehan, W Marston; Spellman, Paul T et al. (2016) Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma. N Engl J Med 374:135-45
Speir, Matthew L; Zweig, Ann S; Rosenbloom, Kate R et al. (2016) The UCSC Genome Browser database: 2016 update. Nucleic Acids Res 44:D717-25
Ceccarelli, Michele; Barthel, Floris P; Malta, Tathiane M et al. (2016) Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma. Cell 164:550-63
Blau, C Anthony; Ramirez, Arturo B; Blau, Sibel et al. (2016) A Distributed Network for Intensive Longitudinal Monitoring in Metastatic Triple-Negative Breast Cancer. J Natl Compr Canc Netw 14:8-17
Zheng, Siyuan; Cherniack, Andrew D; Dewal, Ninad et al. (2016) Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma. Cancer Cell 29:723-736
Cancer Genome Atlas Network (2015) Genomic Classification of Cutaneous Melanoma. Cell 161:1681-96

Showing the most recent 10 out of 22 publications