Our proposal for a Sage CCSB, "Integrating cancer datasets for predictive model development and training," has as its central scientific theme the generation of a set of probabilistic causal models for a series of tumor types from numerous collaborators. By selecting sample sets with different clinical outcomes, the resultant Sage models will have applications impacting cancer biology, early intervention, and cancer treatments. The Sage CCSB leverages the extensive work done at Rosetta/Merck on predictive models in numerous disease areas, which has been gifted to a new nonprofit medical research organization, "Sage Bionetworks." The Sage CCSB operational model contains a core platform of curated data, mathematical models and experienced investigators mentoring postdoctoral trainees/fellows. The data comes from collaborators and consists of DNA variation data, RNA expression data and clinical outcomes. The trainees will collate and annotate the genotypic, intermediate molecular phenotype, and clinical end point data from at least five different tumor-type cohorts and develop models that can predict potential new cancer targets, markers for early detection, and clinical outcomes. They will do externships at other sites (CCSBs), where they will build additional models of their data and facilitate reciprocal exchange of ideas. The trainees will delineate specifications for tools that will make the access to these models more scalable. Validation of their hypotheses will be performed at the Fred Hutchinson Cancer Research Center and the Netherlands Cancer Institute. This post-doctoral program will provide a unique training and mentorship environment in cancer systems biology and facilitate interactions between CCSBs and NCI.

Public Health Relevance

The massive generation of molecular information in oncology will not in itself change cancer death rates. This highlights the need to transition from archiving and binning facts to building predictive models of disease that help patients. Probabilistic causal models with curated data will allow early detection markers and directed therapies as well as predicting outcomes. The Sage CCSB will enable this distributed model building, while training scientists, building interface tools, and linking models between sites.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Specialized Center--Cooperative Agreements (U54)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-SRLB-C (J1))
Program Officer
Gallahan, Daniel L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Sage Bionetworks
United States
Zip Code
Commo, F; Ferté, C; Soria, J C et al. (2015) Impact of centralization on aCGH-based genomic profiles for precision medicine in oncology. Ann Oncol 26:582-8
Jang, In Sock; Dienstmann, Rodrigo; Margolin, Adam A et al. (2015) Stepwise group sparse regression (SGSR): gene-set-based pharmacogenomic predictive models with stepwise selection of functional priors. Pac Symp Biocomput :32-43
Chaibub Neto, Elias (2015) Speeding Up Non-Parametric Bootstrap Computations for Statistics Based on Sample Moments in Small/Moderate Sample Size Applications. PLoS One 10:e0131333
Dienstmann, Rodrigo; Jang, In Sock; Bot, Brian et al. (2015) Database of genomic biomarkers for cancer drugs and clinical targetability in solid tumors. Cancer Discov 5:118-23
Guinney, Justin; Dienstmann, Rodrigo; Wang, Xin et al. (2015) The consensus molecular subtypes of colorectal cancer. Nat Med 21:1350-6
Mikheev, Andrei M; Mikheeva, Svetlana A; Trister, Andrew D et al. (2015) Periostin is a novel therapeutic target that predicts and regulates glioma malignancy. Neuro Oncol 17:372-82
Gönen, Mehmet; Kaski, Samuel (2014) Kernelized Bayesian Matrix Factorization. IEEE Trans Pattern Anal Mach Intell 36:2047-60
Gönen, Mehmet (2014) Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning. Pattern Recognit Lett 38:132-141
Chaibub Neto, Elias; Bare, J Christopher; Margolin, Adam A (2014) Simulation studies as designed experiments: the comparison of penalized regression models in the "large p, small n" setting. PLoS One 9:e107957
Gönen, Mehmet; Margolin, Adam A (2014) Drug susceptibility prediction against a panel of drugs using kernelized Bayesian multitask learning. Bioinformatics 30:i556-63

Showing the most recent 10 out of 29 publications