Our goals are to provide enhancements to Classification and Regression Tree (CART) software (Breiman, Freidman, Olshen and Stone, 1984), in SYSTAT and SAS, including: (1) a complete graphical interface (2) tree- editing of splits and manual tree pruning and grafting; (3) standard statistical and modeling procedures for all case's in a node, (4) diagnostic reports including instability among cross-validation trees; and confidence intervals of variable importances; (5) Breiman's within- node error measure, Olshen's massively repeated cross-validation for node specific measures, optional averaging of 5.10 repeated cross-validation runs for overall error; (6) Boolean ( logical """"""""and/or"""""""" compound splitting) rules; (7) Friedman's numerically superior linear combination splits; (8) algorithms for censored survival data (9) and serially dependent observations; (10) Breiman's Bootstrap Aggregation predictors; a consensus of trees grown on bootstrapped samples improving accuracy; (11) Breiman's probabilistic splitting; produces a probability distribution over terminal nodes for each observation; (12) split revisiting -- trees improved by reoptimizing a parent split holding child splits fixed; yielding better performing and/or smaller trees; (13) automatic missing value imputation and multivariate outlier detection; (14) faster running times and a version for multiple CPU platforms; (15) tutorial materials.

Proposed Commercial Applications

By allowing CART to be used by most levels of analyst on all major computing platforms in SAS and SYSTAT, we anticipate a market including users in all areas of biomedical research. We expect to ship for desktop and single user workstations as well as research mainframes and mini-computers world-wide, including University and Government Computing Centers, hospital research centers, and pharmaceutical companies. As we already have over 100 mainframe SAS sites and thousands of SYSTAT PC sites we anticipate excellent acceptance of enhanced CART products.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
5R44CA065338-03
Application #
2545368
Study Section
Special Emphasis Panel (ZRG7-SSS-1 (23))
Program Officer
Erickson, Burdette (BUD) W
Project Start
1994-09-15
Project End
1998-09-29
Budget Start
1997-09-30
Budget End
1998-09-29
Support Year
3
Fiscal Year
1997
Total Cost
Indirect Cost
Name
Salford Systems
Department
Type
DUNS #
City
San Diego
State
CA
Country
United States
Zip Code
92120