Ensemble learning involves the simple task of taking elementary procedures (base learners) and combining them to form an ensemble. This simple process often yields a predictor with superior performance;one of the most successful examples is random forests (RF), an ensemble formed using random tree base-learners. In this project we use RF to study a collection of cancer related problems. One area of focus involves a specific pathway in breast cancer. To date much of the work in elucidating the molecular characteristics of breast cancer has focused on gene expression profiling. These signatures are principally markers for proliferation and do not clearly identify novel or metastasis-specific pathways. We recently experimentally showed how the breast cancer gene Raf Kinase Inhibitory Protein (RKIP) regulates a specific metastasis pathway. Importantly, the RKIP pathway does not influence primary tumor growth or cell proliferation but rather involves metastasis-specific steps. Having worked out the RKIP pathway in experimental detail, this project will use RF to verify statistically that RKIP operationally drives clinical metastasis usin expression data from primary tumor samples. However, this poses a dilemma. While forests are ideal tools for fitting interactions, no rigorous methodology currently exists for untangling the highly involved variable relationships within a forest and there is no comprehensive and rigorous method for selecting variables. In this project we develop a unified prediction and variable selection framework to address this. Applying this we introduce a new variable selection statistic for identifying interactions and use this to validate the RKIP pathway. We develop a unified framework to facilitate the use of this statistic in general. In another application, we introduce grouped variable comparisons for building gene-pathways. Using this we expand our work on the Interferon-Related DNA Damage Resistance Signature (IRDS), a therapeutic signature that can predict resistance to chemotherapy and/or radiation across a wide variety of common human cancers. We describe a regulatory biological network for the IRDS based on multi-dimensional genomics data. Edges of this network are weighted using a RF measure of variable-relatedness to pin-point important gene-gene interactions. In another major thrust, using a uniquely rich worldwide esophageal cancer database, we describe individualized treatment recommendations for esophageal cancer patients using a novel RF algorithm for stage- grouping and prognostication. The algorithm is general enough that it can be applied to other cancers, thus providing physicians, oncologists, and other cancer health care professionals with a new powerful data-analytic tool for individualized prognostication and treatment decision making. To share the methodological and statistical advancements of RF arising from this project we develop a user friendly unified RF software, RF-SRC, to be made freely available under the GNU Public License. This software will allow for massive scalability by utilizing cutting edge parallelization solutions.

Public Health Relevance

We study several problems related to cancer using random forests (RF) and describe an enhanced unified RF that can be used as a general all-purpose data tool with massive parallel scalability.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
1R01CA163739-01A1
Application #
8368988
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Ossandon, Miguel
Project Start
2012-08-10
Project End
2016-05-31
Budget Start
2012-08-10
Budget End
2013-05-31
Support Year
1
Fiscal Year
2012
Total Cost
$255,090
Indirect Cost
$84,828
Name
University of Miami School of Medicine
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
052780918
City
Coral Gables
State
FL
Country
United States
Zip Code
33146
Boelens, Mirjam C; Wu, Tony J; Nabet, Barzin Y et al. (2014) Exosome transfer from stromal to breast cancer cells regulates therapy resistance pathways. Cell 159:499-513
Chen, Xi; Ishwaran, Hemant (2013) Pathway hunting by random survival forests. Bioinformatics 29:99-105