Cancer is a disease of the genome, caused by disruptions in a person's DNA. Orders of magnitude decreases in price and increases in sequencing throughput enabled sequencing of hundreds of genomes. This will launch a new phase bf "Precision Medicine," where molecular markers can guide therapies tailored to patients. The genomics revolution is now systematically characterizing every somatic change in every tumor for large cohorts (>300 patients). Despite some successes, predicting cancer outcomes based on molecular signatures remains a major challenge. This proposal aims at obliterating several key roadblocks stymieing progress. First, the raw sequence data is not particularly well-suited for use in developing predictive models. Therefore, gene- and pathway-level evidence will be derived from CGHub to significantly increase the utility of the information for biomedical discovery. The information will be collected in a Social Graph technology framework like Facebook to scale to billions of interconnected objects called the Biomedical Evidence Graph (BMEG). Second, the datasets are so large they are impractical to move around on the internet. Thus, an environment will be created within which researchers can move their code to the vast amounts of data within the BMEG. Third, prediction challenges will be created based on cancer genomics datasets and patient outcomes. While there have been a few successes in predicting outcomes, current approaches suffer from reproducibility and robustness when applied to unseen data. This activity will reach a broad community of algorithm developers, promote transparency and sharing of bioinformatics code, and create a strong network effect to crowd-source the development of the best models for biological discovery. The system constructed will be focused around the investigation of cancer outcomes but the entire pipeline will be of general utility for any number of genome-based projects including investigating any number of disease, stem cell properties, model organisms, and genome-wide association studies.

Public Health Relevance

While we are accumulating vast amounts of information on cancer cells, we are still searching in the dark for clues about predicting treatment strategies. It is of paramount importance to accelerate computational discovery. The creation of the BMEG will catalyze community participation to uncover novel relationships to elucidate new fundamental biology on oncogenesis and therapeutic directions for treating this disease.

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Santa Cruz
Engineering (All Types)
Biomed Engr/Col Engr/Engr Sta
Santa Cruz
United States
Zip Code
Hoadley, Katherine A; Yau, Christina; Wolf, Denise M et al. (2014) Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 158:929-44
Boutros, Paul C; Ewing, Adam D; Ellrott, Kyle et al. (2014) Global optimization of somatic variant identification in cancer genomes with a global community challenge. Nat Genet 46:318-9