Building from the PI's current R01, we propose next generation random forests (RF) designed for unprecedented accuracy and computational scalability to meet the challenges of today's complex and big data in the health sciences. Superior accuracy is achieved using super greedy trees which circumvent limitations on local adaptivity imposed by classical tree splitting. We identify a key quantity, forest weights, and show how these can be leveraged for further improvements and generalizability. In one application, improved survival estimators are applied to worldwide esophageal cancer data to develop guidelines for clinical decision making. Richer RF inference is another issue explored. Cutting edge machine learning methods rarely consider the problem of estimating variability. For RF, bootstrapping currently exists as the only tool for reliably estimating con?dence intervals, but due to heavy computations is rarely applied. We introduce tools to rapidily calculate standard errors based on U-statistic theory. These will be used to increase robustness of esophageal clinical recommendations and to investigate survival temporal trends in cardiovascular disease. In another application, we make use of our new massive data scalability for discovery of tumor and immune regulators of immunotherapy in cancers. This project will set the standard for RF computational performance. Building from the core libraries of the highly accessed R-package randomForestSRC (RF-SRC), software developed under the PIs current R01, we develop open source next generation RF software, RF-SRC Everywhere, Big Data RF-SRC, and HPC RF-SRC. The software will be deployable on a number of popular machine learning workbenches, use distributed data storage technologies, and be optimized for big-p, big-n, and big-np scenarios.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM125072-08
Application #
9929599
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
2017-09-01
Project End
2021-05-31
Budget Start
2020-06-01
Budget End
2021-05-31
Support Year
8
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Miami School of Medicine
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
052780918
City
Coral Gables
State
FL
Country
United States
Zip Code
33146
Ishwaran, Hemant; Lu, Min (2018) Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival. Stat Med :
Rech, Andrew J; Balli, David; Mantero, Alejandro et al. (2018) Tumor Immunity and Survival as a Function of Alternative Neopeptides in Human Cancer. Cancer Immunol Res :
Lamont, Andrea; Lyons, Michael D; Jaki, Thomas et al. (2018) Identification of predicted individual treatment effects in randomized clinical trials. Stat Methods Med Res 27:142-157
Dazard, Jean-Eudes; Ishwaran, Hemant; Mehlotra, Rajeev et al. (2018) Ensemble survival tree models to reveal pairwise interactions of variables with time-to-events outcomes in low-dimensional setting. Stat Appl Genet Mol Biol 17:
Lu, Min; Ishwaran, Hemant (2018) A prediction-based alternative to P values in regression models. J Thorac Cardiovasc Surg 155:1130-1136.e4
Lu, Min; Sadiq, Saad; Feaster, Daniel J et al. (2018) Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods. J Comput Graph Stat 27:209-219
Hsich, Eileen M; Blackstone, Eugene H; Thuita, Lucy et al. (2017) Sex Differences in Mortality Based on United Network for Organ Sharing Status While Awaiting Heart Transplantation. Circ Heart Fail 10: