The objective of the proposed research is to develop and apply new procedures for analyzing data with missing covariates in the context of three cancer studies, with the following aims: to develop biostatistical methods that will be generally useful in research on cancer as well as in other research areas; to conduct innovative analyses of the data from our studies; and to develop guidelines for when specific methods are preferable to others. To focus the techniques and illustrate the methodology, we will analyze data from the following studies: a randomized clinical trial to study the efficacy of dietary supplements in preventing polyps of the large bowel; a retrospective study of the recurrence of tonsil cancer following radiation therapy; and a randomized clinical trial and related prospective cohort study that will be used to examine the time to recurrence of lung cancer following surgery and the time to death following surgery. The models used to analyze the three studies will include logistic regression, survival models, and cure models that combine logistic regression and survival models. The procedures that we will develop to fit these models in the presence of missing covariate data include maximum- likelihood methods using variants of the EM algorithm, Bayesian methods using variants of Gibbs sampling, and multiple-imputation techniques. The methods developed will be compared with each other (when applicable) and with alternatives from the literature such as complete-case analysis and single imputation, using both the actual study data and simulated data that are plausible realizations of study results derived from known generating mechanisms. These comparisons will explore the performances of the methods under various conditions involving the sample size, the amount of missing data, the mechanism causing missing data, and the true model for the data. Performance criteria will include validity of inferences, efficiency, bias, and robustness to model violations.
Zhuang, D; Schenker, N; Taylor, J M et al. (2000) Analysing the effects of anaemia on local recurrence of head and neck cancer when covariate values are missing. Stat Med 19:1237-49 |
Cho, M; Schenker, N (1999) Fitting the log-F accelerated failure time model with incomplete covariate data. Biometrics 55:826-33 |
Bycott, P; Taylor, J (1998) A comparison of smoothing techniques for CD4 data measured with error in a time-dependent Cox proportional hazards model. Stat Med 17:2061-77 |