The principal investigator (PI) aims to develop new statistical methodology for analyzing incomplete data using regression of quantiles where causes of incomplete data are due to either censoring or measurement error. The research is challenging mainly because quantile regression aims to avoid parametric error distributional assumptions so the standard likelihood-based methods cannot be used. The PI will focus on three different but related problems. First, new approaches to estimation based on corrected scores will be developed to account for a class of measurement errors in the covariates. Second, an index-based estimation method will be proposed for censored quantile regression to accommodate high dimensional covariates. Penalization methods will be developed for variable selection. The third problem focuses on data with covariates subject to fixed censoring. To improve the efficiency over estimators from complete samples, a new multiple imputation approach based on censored regression of quantiles will be developed. The new imputation method can be used to improve statistical inference for not only quantile regression but also more general regression problems.

The proposed research will have broad and valuable applicability in various fields, for instance, in microarray studies where the gene expression data are often measured with errors, in survival studies where random censoring is common, and in environmental and geological studies where measurements are often subject to fixed censoring. For example, in contrast to conventional statistical methods, quantile regression models can help discover heterogeneous effects of drug treatments on survival times of both high and low risk patients. The project will integrate research and education by developing advanced topics courses, mentoring students especially those from under-represented groups.

Project Report

Quantile regression has emerged as a powerful alternative to least squares regression. Even though an increasing amount of research has been conducted on quantile regression in the past decade, the methodology and theory in the cases of incomplete data are still underdeveloped. The main challenge is that quantile regression aims to avoid parametric distributional assumptions so likelihood-based methods cannot be used. This main focus of this project was to develop theory and methodology for quantile regression with incomplete data including censored, missing and mismeasured data. This three-year project has led to the following main outcomes: (1) a semiparametric multiple imputation approach for linear M-regression models with censored covariates, which relaxes the parametric distributional assumptions for fitting a censored linear quantile regression (Wang and Feng, 2012); (2) a new estimation approach based on corrected scores to account for a class of covariate measurement errors in quantile regression (Wang, Stefanski and Zhu, 2012); (3) two estimation procedures for the generalized linear quantile regression for competing risks data when the failure type may be missing and the analysis of the Mashi data for investigating the effect of formula- versus breastfeeding plus extended infant zidovudine prophylaxis on HIV-related death of infants born to HIV-infected mothers in Botswana (Sun, Wang and Gilbert, 2012); (4) a new procedure for estimating the variance of censored quantile regression coefficient estimator (Pang, Lu and Wang, 2012); (5) a simple estimator based on informative subset for quantile regression with responses subject to fixed censoring (Tang et al, 2012); (6) a variable selection procedure for quantile regression with censored outcomes (Wang, Zhou and Li, 2013); (7) a frequentist multiple imputation approach for models with covariates subject to detect limit, including for generalized linear models (Bernhardt, Wang and Zhang, 2013a) and for accelerated failure time models with responses subject to censoring (Bernhardt, Wang and Zhang, 2013b). Among other contributions, the PI and her collaborators also developed variable selection methods for varying coefficient models (Tang, Wang and Zhu, 2012; Tang et al., 2012; Tang et al., 2013), an empirical likelihood inference approach for quantile regression models with longitudinal data (Wang and Zhu, 2012), and a penalization method for identifying differential aberrations in multiple-sample array CGH studies (Wang and Hu, 2012). The grant has supported the research and travel of the PI and six Ph.D. students. The PI and her students have presented research results from the project at national and international conferences and department seminars including 12 invited talks, seven contributed talks and three posters. The developed software is made publicly available at the PI’s homepage.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1007420
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2010-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2010
Total Cost
$140,000
Indirect Cost
Name
North Carolina State University Raleigh
Department
Type
DUNS #
City
Raleigh
State
NC
Country
United States
Zip Code
27695