Cancer is a disease of the genome, caused by disruptions in a person's DNA. Orders of magnitude decreases in price and increases in sequencing throughput enabled sequencing of hundreds of genomes. This will launch a new phase bf "Precision Medicine," where molecular markers can guide therapies tailored to patients. The genomics revolution is now systematically characterizing every somatic change in every tumor for large cohorts (>300 patients). Despite some successes, predicting cancer outcomes based on molecular signatures remains a major challenge. This proposal aims at obliterating several key roadblocks stymieing progress. First, the raw sequence data is not particularly well-suited for use in developing predictive models. Therefore, gene- and pathway-level evidence will be derived from CGHub to significantly increase the utility of the information for biomedical discovery. The information will be collected in a Social Graph technology framework like Facebook to scale to billions of interconnected objects called the Biomedical Evidence Graph (BMEG). Second, the datasets are so large they are impractical to move around on the internet. Thus, an environment will be created within which researchers can move their code to the vast amounts of data within the BMEG. Third, prediction challenges will be created based on cancer genomics datasets and patient outcomes. While there have been a few successes in predicting outcomes, current approaches suffer from reproducibility and robustness when applied to unseen data. This activity will reach a broad community of algorithm developers, promote transparency and sharing of bioinformatics code, and create a strong network effect to crowd-source the development of the best models for biological discovery. The system constructed will be focused around the investigation of cancer outcomes but the entire pipeline will be of general utility for any number of genome-based projects including investigating any number of disease, stem cell properties, model organisms, and genome-wide association studies.

Public Health Relevance

While we are accumulating vast amounts of information on cancer cells, we are still searching in the dark for clues about predicting treatment strategies. It is of paramount importance to accelerate computational discovery. The creation of the BMEG will catalyze community participation to uncover novel relationships to elucidate new fundamental biology on oncogenesis and therapeutic directions for treating this disease.

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Santa Cruz
Engineering (All Types)
Biomed Engr/Col Engr/Engr Sta
Santa Cruz
United States
Zip Code
Si, H; Lu, H; Yang, X et al. (2016) TNF-α modulates genome-wide redistribution of ΔNp63α/TAp73 and NF-κB cREL interactive binding on TP53 and AP-1 motifs to promote an oncogenic gene program in squamous cancer. Oncogene 35:5781-5794
Hill, Steven M; Heiser, Laura M; Cokelaer, Thomas et al. (2016) Inferring causal molecular networks: empirical assessment through a community-based effort. Nat Methods 13:310-8
Lee, John K; Phillips, John W; Smith, Bryan A et al. (2016) N-Myc Drives Neuroendocrine Prostate Cancer Initiated from Human Prostate Epithelial Cells. Cancer Cell 29:536-47
Sokolov, Artem; Paull, Evan O; Stuart, Joshua M (2016) ONE-CLASS DETECTION OF CELL STATES IN TUMOR SUBTYPES. Pac Symp Biocomput 21:405-16
Sokolov, Artem; Carlin, Daniel E; Paull, Evan O et al. (2016) Pathway-Based Genomics Prediction using Generalized Elastic Net. PLoS Comput Biol 12:e1004790
Drake, Justin M; Paull, Evan O; Graham, Nicholas A et al. (2016) Phosphoproteome Integration Reveals Patient-Specific Networks in Prostate Cancer. Cell 166:1041-54
Ewing, Adam D; Houlahan, Kathleen E; Hu, Yin et al. (2015) Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat Methods 12:623-30
Mutation Consequences and Pathway Analysis working group of the International Cancer Genome Consortium (2015) Pathway and network analysis of cancer genomes. Nat Methods 12:615-21
Paten, Benedict; Diekhans, Mark; Druker, Brian J et al. (2015) The NIH BD2K center for big data in translational genomics. J Am Med Inform Assoc 22:1143-7

Showing the most recent 10 out of 18 publications