Ensemble learning involves the simple task of taking elementary procedures (base learners) and combining them to form an ensemble. This simple process often yields a predictor with superior performance;one of the most successful examples is random forests (RF), an ensemble formed using random tree base-learners. In this project we use RF to study a collection of cancer related problems. One area of focus involves a specific pathway in breast cancer. To date much of the work in elucidating the molecular characteristics of breast cancer has focused on gene expression profiling. These signatures are principally markers for proliferation and do not clearly identify novel or metastasis-specific pathways. We recently experimentally showed how the breast cancer gene Raf Kinase Inhibitory Protein (RKIP) regulates a specific metastasis pathway. Importantly, the RKIP pathway does not influence primary tumor growth or cell proliferation but rather involves metastasis-specific steps. Having worked out the RKIP pathway in experimental detail, this project will use RF to verify statistically that RKIP operationally drives clinical metastasis usin expression data from primary tumor samples. However, this poses a dilemma. While forests are ideal tools for fitting interactions, no rigorous methodology currently exists for untangling the highly involved variable relationships within a forest and there is no comprehensive and rigorous method for selecting variables. In this project we develop a unified prediction and variable selection framework to address this. Applying this we introduce a new variable selection statistic for identifying interactions and use this to validate the RKIP pathway. We develop a unified framework to facilitate the use of this statistic in general. In another application, we introduce grouped variable comparisons for building gene-pathways. Using this we expand our work on the Interferon-Related DNA Damage Resistance Signature (IRDS), a therapeutic signature that can predict resistance to chemotherapy and/or radiation across a wide variety of common human cancers. We describe a regulatory biological network for the IRDS based on multi-dimensional genomics data. Edges of this network are weighted using a RF measure of variable-relatedness to pin-point important gene-gene interactions. In another major thrust, using a uniquely rich worldwide esophageal cancer database, we describe individualized treatment recommendations for esophageal cancer patients using a novel RF algorithm for stage- grouping and prognostication. The algorithm is general enough that it can be applied to other cancers, thus providing physicians, oncologists, and other cancer health care professionals with a new powerful data-analytic tool for individualized prognostication and treatment decision making. To share the methodological and statistical advancements of RF arising from this project we develop a user friendly unified RF software, RF-SRC, to be made freely available under the GNU Public License. This software will allow for massive scalability by utilizing cutting edge parallelization solutions.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
5R01CA163739-03
Application #
8676476
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Ossandon, Miguel
Project Start
2012-08-10
Project End
2016-05-31
Budget Start
2014-06-01
Budget End
2015-05-31
Support Year
3
Fiscal Year
2014
Total Cost
$241,353
Indirect Cost
$69,430
Name
University of Miami School of Medicine
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
052780918
City
Coral Gables
State
FL
Country
United States
Zip Code
33146
Dazard, Jean-Eudes; Ishwaran, Hemant; Mehlotra, Rajeev et al. (2018) Ensemble survival tree models to reveal pairwise interactions of variables with time-to-events outcomes in low-dimensional setting. Stat Appl Genet Mol Biol 17:
Pande, Amol; Li, Liang; Rajeswaran, Jeevanantham et al. (2017) Boosted Multivariate Trees for Longitudinal Data. Mach Learn 106:277-305
Tang, Fei; Ishwaran, Hemant (2017) Random Forest Missing Data Algorithms. Stat Anal Data Min 10:363-377
Rice, Thomas W; Ishwaran, Hemant; Ferguson, Mark K et al. (2017) Cancer of the Esophagus and Esophagogastric Junction: An Eighth Edition Staging Primer. J Thorac Oncol 12:36-42
Rice, Thomas W; Ishwaran, Hemant; Kelsen, David P et al. (2016) Recommendations for neoadjuvant pathologic staging (ypTNM) of cancer of the esophagus and esophagogastric junction for the 8th edition AJCC/UICC staging manuals. Dis Esophagus 29:906-912
Benci, Joseph L; Xu, Bihui; Qiu, Yu et al. (2016) Tumor Interferon Signaling Regulates a Multigenic Resistance Program to Immune Checkpoint Blockade. Cell 167:1540-1554.e12
Rice, Thomas W; Ishwaran, Hemant; Blackstone, Eugene H et al. (2016) Recommendations for clinical staging (cTNM) of cancer of the esophagus and esophagogastric junction for the 8th edition AJCC/UICC staging manuals. Dis Esophagus 29:913-919
Rice, T W; Ishwaran, H; Hofstetter, W L et al. (2016) Recommendations for pathologic staging (pTNM) of cancer of the esophagus and esophagogastric junction for the 8th edition AJCC/UICC staging manuals. Dis Esophagus 29:897-905
Rice, T W; Lerut, T E M R; Orringer, M B et al. (2016) Worldwide Esophageal Cancer Collaboration: neoadjuvant pathologic staging data. Dis Esophagus 29:715-723
Sadiq, Saad; Yan, Yilin; Shyu, Mei-Ling et al. (2016) Enhancing Multimedia Imbalanced Concept Detection Using VIMP in Random Forests. Proc IEEE Int Conf Inf Reuse Integr 2016:601-608

Showing the most recent 10 out of 21 publications