Ensemble learning involves the simple task of taking elementary procedures (base learners) and combining them to form an ensemble. This simple process often yields a predictor with superior performance;one of the most successful examples is random forests (RF), an ensemble formed using random tree base-learners. In this project we use RF to study a collection of cancer related problems. One area of focus involves a specific pathway in breast cancer. To date much of the work in elucidating the molecular characteristics of breast cancer has focused on gene expression profiling. These signatures are principally markers for proliferation and do not clearly identify novel or metastasis-specific pathways. We recently experimentally showed how the breast cancer gene Raf Kinase Inhibitory Protein (RKIP) regulates a specific metastasis pathway. Importantly, the RKIP pathway does not influence primary tumor growth or cell proliferation but rather involves metastasis-specific steps. Having worked out the RKIP pathway in experimental detail, this project will use RF to verify statistically that RKIP operationally drives clinical metastasis usin expression data from primary tumor samples. However, this poses a dilemma. While forests are ideal tools for fitting interactions, no rigorous methodology currently exists for untangling the highly involved variable relationships within a forest and there is no comprehensive and rigorous method for selecting variables. In this project we develop a unified prediction and variable selection framework to address this. Applying this we introduce a new variable selection statistic for identifying interactions and use this to validate the RKIP pathway. We develop a unified framework to facilitate the use of this statistic in general. In another application, we introduce grouped variable comparisons for building gene-pathways. Using this we expand our work on the Interferon-Related DNA Damage Resistance Signature (IRDS), a therapeutic signature that can predict resistance to chemotherapy and/or radiation across a wide variety of common human cancers. We describe a regulatory biological network for the IRDS based on multi-dimensional genomics data. Edges of this network are weighted using a RF measure of variable-relatedness to pin-point important gene-gene interactions. In another major thrust, using a uniquely rich worldwide esophageal cancer database, we describe individualized treatment recommendations for esophageal cancer patients using a novel RF algorithm for stage- grouping and prognostication. The algorithm is general enough that it can be applied to other cancers, thus providing physicians, oncologists, and other cancer health care professionals with a new powerful data-analytic tool for individualized prognostication and treatment decision making. To share the methodological and statistical advancements of RF arising from this project we develop a user friendly unified RF software, RF-SRC, to be made freely available under the GNU Public License. This software will allow for massive scalability by utilizing cutting edge parallelization solutions.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Ossandon, Miguel
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Miami School of Medicine
Public Health & Prev Medicine
Schools of Medicine
Coral Gables
United States
Zip Code
Pande, Amol; Li, Liang; Rajeswaran, Jeevanantham et al. (2017) Boosted Multivariate Trees for Longitudinal Data. Mach Learn 106:277-305
Benci, Joseph L; Xu, Bihui; Qiu, Yu et al. (2016) Tumor Interferon Signaling Regulates a Multigenic Resistance Program to Immune Checkpoint Blockade. Cell 167:1540-1554.e12
Rice, Thomas W; Ishwaran, Hemant; Blackstone, Eugene H et al. (2016) Recommendations for clinical staging (cTNM) of cancer of the esophagus and esophagogastric junction for the 8th edition AJCC/UICC staging manuals. Dis Esophagus 29:913-919
Rice, T W; Ishwaran, H; Hofstetter, W L et al. (2016) Recommendations for pathologic staging (pTNM) of cancer of the esophagus and esophagogastric junction for the 8th edition AJCC/UICC staging manuals. Dis Esophagus 29:897-905
Rice, T W; Lerut, T E M R; Orringer, M B et al. (2016) Worldwide Esophageal Cancer Collaboration: neoadjuvant pathologic staging data. Dis Esophagus 29:715-723
Rice, T W; Chen, L-Q; Hofstetter, W L et al. (2016) Worldwide Esophageal Cancer Collaboration: pathologic staging data. Dis Esophagus 29:724-733
Sadiq, Saad; Yan, Yilin; Shyu, Mei-Ling et al. (2016) Enhancing Multimedia Imbalanced Concept Detection Using VIMP in Random Forests. Proc IEEE Int Conf Inf Reuse Integr 2016:601-608
Rice, Thomas W; Ishwaran, Hemant; Kelsen, David P et al. (2016) Recommendations for neoadjuvant pathologic staging (ypTNM) of cancer of the esophagus and esophagogastric junction for the 8th edition AJCC/UICC staging manuals. Dis Esophagus 29:906-912
Rice, T W; Apperson-Hansen, C; DiPaola, L M et al. (2016) Worldwide Esophageal Cancer Collaboration: clinical staging data. Dis Esophagus 29:707-714
Twyman-Saint Victor, Christina; Rech, Andrew J; Maity, Amit et al. (2015) Radiation and dual checkpoint blockade activate non-redundant immune mechanisms in cancer. Nature 520:373-7

Showing the most recent 10 out of 18 publications