The Cancer Genome Atlas (TCGA) set the standards for large-scale cancer genome projects worldwide. In the next phase, the National Cancer Institute and its Center for Cancer Genomics are planning large-scale projects closely tied to clinical questions and trials. In order to perform the analysis of these data, the NCI is creating a Genome Data Analysis Network (GDAN) of different types of Genome Data Analysis Centers (GDACs). Central to this Network is a single Processing GDAC, which will take all the harmonized data, as stored in the NCI's Genomics Data Commons, and perform higher level integrated analyses on these data to support both the Analysis Working Groups (AWGs) within the Network (which will be formed for each project to perform special analyses of the data and write manuscripts) as well as the entire biomedical research community. Herein we propose to build the centralized Processing GDAC on top of our FireCloud platform, an infrastructure to run large scale computation on the cloud in a fully rigorous and reproducible fashion. FireCloud development was based on our experience with Firehose, the Broad internal platform on which the standard TCGA data and analyses currently run. We propose to create and operate the GDAN Standard Workflow, incorporating tools actively developed and used within the GDAN and across the entire field, with particular emphasis on clinical tools. This Workflow will serve as the starting point for AWGs and set the highest standards of transparency, reproducibility and rigor for cancer genome analysis. The results of the Standard Workflow will be stored in a public database, and accessible via standard APIs, and used together with a continuously updated database of prior knowledge to create scientific reports that will be made available to the community, in a pre-publication manner. Finally, a major innovation is that AWG members will be able to login into FireCloud and rerun the entire workflow, or parts of it, with their own parameters and subsets of the data ? thus making the entire GDAN analysis fully reproducible and scalable. Our goals are therefore: (1) To create a global infrastructure for collaborative extreme- scale cancer analysis; (2) Operate the Standard Workflows at scale; (3) Rapidly and continuously evolve the Standard Workflows; and (4) created improved capabilities for reporting, exploring the results, clinical diagnostics and reproducibility.

Public Health Relevance

Our mission is to create the largest, most comprehensive workflow for analyzing cancer genome data together with clinical data. This workflow will propel discoveries of cancer genes, mutational mechanisms, molecular subtypes and their association with response to treatment and other clinical parameters. These can then be the basis of future hypotheses and clinical trials.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Yang, Liming
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Broad Institute, Inc.
United States
Zip Code
Liu, Yang; Sethi, Nilay S; Hinoue, Toshinori et al. (2018) Comparative Molecular Analysis of Gastrointestinal Adenocarcinomas. Cancer Cell 33:721-735.e8
Radovich, Milan; Pickering, Curtis R; Felau, Ina et al. (2018) The Integrated Genomic Landscape of Thymic Epithelial Tumors. Cancer Cell 33:244-258.e10
Bailey, Matthew H; Tokheim, Collin; Porta-Pardo, Eduard et al. (2018) Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173:371-385.e18
Robertson, A Gordon; Kim, Jaegil; Al-Ahmadie, Hikmat et al. (2017) Comprehensive Molecular Characterization of Muscle-Invasive Bladder Cancer. Cell 171:540-556.e25
Cancer Genome Atlas Research Network. Electronic address:; Cancer Genome Atlas Research Network (2017) Comprehensive and Integrated Genomic Characterization of Adult Soft Tissue Sarcomas. Cell 171:950-965.e28
Robertson, A Gordon; Shih, Juliann; Yau, Christina et al. (2017) Integrative Analysis Identifies Four Molecular and Clinical Subsets in Uveal Melanoma. Cancer Cell 32:204-220.e15
Cancer Genome Atlas Research Network. Electronic address:; Cancer Genome Atlas Research Network (2017) Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma. Cancer Cell 32:185-203.e13
Cancer Genome Atlas Research Network; Analysis Working Group: Asan University; BC Cancer Agency et al. (2017) Integrated genomic characterization of oesophageal carcinoma. Nature 541:169-175
Cancer Genome Atlas Research Network. Electronic address:; Cancer Genome Atlas Research Network (2017) Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma. Cell 169:1327-1341.e23