Despite the widely used survival analysis in cancer research for constructing prognostic factors for cancers and identifying risk factors for cancer recurrence or survival after treatment, there is no consensus on how to measure variations of event times explained by available factors. Many analogous measures of the coefficient of determination, also known as R-squared, have been proposed for proportional hazard models. However, some measures are up bounded by a value much smaller than one, even for a time determined by available factors, and others are too sensitive for falsely correlated factors. On the other hand, research is limited on such measures for accelerated failure time models, and to the best of our knowledge, the only measure proposed recently is based on parametrically partitioning the total variation into explained and unexplained parts, assuming that the true model is known. To address this issue, the objective of this project is to develop proper statistics to measure the variation of event times, under popular right censoring mechanisms, explained by available factors. The premise of this proposal is that a variance function can be employed to describe the dependence of variation on the pertinent mean, and quantifying the variation change along the variance function can measure explained variation of heteroscedastic event times. In recent work on generalized linear models, it was demonstrated that a variable-function-based R-squared appropriately measures the explained variation of non-Gaussian responses. Riding on such successful extension, the two-year research study proposed here focuses on the following two specific aims:
Aim 1. To measure explained variation for accelerated failure time models. While each accelerated failure-time model presents a quadratic variance function, the team will construct the variable- function-based R-squared for such survival models, by addressing censoring issues via proper integration or adjustment. Treating accelerated failure-time models as censored linear regression models, these studies will also extend the classical R-squared with proper management of censoring issues.
Aim 2. To measure explained variation for proportional hazards models. With the partial likelihood function as the likelihood function of a conditional logistic model, the investigators will construct the variable-function-based R-squared for the pertinent conditional logistic model in order to measure the explained variation in the underlying proportional hazards model. In addition, the team will construct a variance-function-based R-squared by measuring variation of an underlying survival process, which presents a binary random variable at each specific time. A rigorous experiment with both simulation and real cancer studies, will be designed to validate the proposed measures across different models in cancer research. The proposed measures will be implemented in a publicly available R package rsq, providing cancer researchers a useful tool to conduct the necessary survival analysis. The success of this project will ultimately help quantify and understand the heritability of different cancers.
In cancer research, it is crucial to identify prognostic factors for cancer onset times or survival times after treatment. Upon completion of the proposed work, a set of properly defined measures will be available for quantifying the variation of event times due to a group of prognostic factors. It will help improve our knowledge on cancers, as well as the predictability of available prognostic factors including genetic factors.