Missing covariate values are common in studies of risk factors of diseases and in many other biomedical studies. Simple complete-case analysis which is routinely used suffers from bias in addition to efficiency loss. Current advanced statistical methods for analyzing such data have limited usage in practice because of the robust concern, or the difficulty in implementation, or both. This project aims at developing new statistical methods for modeling missing covariates in regression models to make inferences on regression parameters with missing covariates robust, efficient, and easy to implement. The objective is to be reached through four steps: (1) A general semi-parametric odds ratio model is proposed for complex missing data problems. The proposed model makes the likelihood approach commonly used in practice more robust and flexible, and easy to apply. (2) The likelihood method for regression with missing data is further robustified in three ways. When missing patterns are relatively simple, smoothing spline models for odds ratio function is proposed; When missing patterns are complex, likelihood estimator is modified to be doubly robust and locally efficient; A framework is proposed for sensitivity analysis with general missing data mechanisms. (3) For problems with a large number of covariates subject to missing values, model selection procedures are studied based on imputed complete data under the semiparametric covariate model. Such procedures can be very helpful in studying risk factors of health events, such as in identifying risk factors of bone fracture from a set of potential risk factors subject to missing values. (4) For all the missing data problems under consideration, software for implementing methods of the research outcomes will be developed and disseminated. The proposed research, when completed, will make analyses of biomedical data with missing covariate values more accessible to researchers in many applied fields and thus promote efficient use of valuable data, such as those from HIV and cancer studies. ? ?
Chen, Hua Yun (2011) Representations of efficient score for coarse data problems based on Neumann series expansion. Ann Inst Stat Math 63:497-509 |
Chen, Hua Yun; Xie, Hui; Qian, Yi (2011) Multiple imputation for missing values through conditional Semiparametric odds ratio models. Biometrics 67:799-809 |
Chen, Hua Yun (2010) On L convergence of Neumann series approximation in missing data problems. Stat Probab Lett 80:864-873 |
Chen, Hua Yun (2010) Compatibility of conditionally specified models. Stat Probab Lett 80:670-677 |
Chen, Hua Yun; Gao, Shasha (2009) Estimation of average treatment effect with incompletely observed longitudinal data: application to a smoking cessation study. Stat Med 28:2451-72 |
Chen, Hua Yun (2009) Estimation and inference based on Neumann series approximation to locally efficient score in missing data problems. Scand Stat Theory Appl 36:713-734 |
Yun Chen, Hua (2007) A semiparametric odds ratio model for measuring association. Biometrics 63:413-21 |