Predicting survival in cancer is important because survival, to a large extent, determines the selection of a particular therapy. Most cancer data sets contain large numbers of cases with missing data, and the usual approach is to remove such cases. But this reduction in data set size, combined with the further reduction caused by splitting the data set into training and testing subsets, can significantly reduce the accuracy of statistical models. This research seeks to validate at least one of two promising approaches for missing data, based upon Mixture Networks and Iterative Relaxation. Both methods avoid throwing away the cases that contain missing data, and both use all the available data to estimate the missing values. Phase II will build a full """"""""Missing Data Handler"""""""" software package for general application to deal with missing data in three contexts; as a standalone package, integrated into Belmont's CrossGraphs visualization software, and integrated into Belmont's ClinTrans prototype for database transformations. The major technical innovation in Phase I will be a general method for working with data that contains missing values. The major health-related contribution will be improved accuracy for predicting cancer survival. resulting in better therapy selection and improved survival.
A successful project will result in new tools for handling missing data. Tools will be incorporated into existing applications software packages developed and licensed by Belmont Research, Inc.