Quantile regression is emerging as an important and active research area driven by diverse applications. Compared with the conventional least squares regression, quantile regression methods are robust against outliers and can capture heterogeneity. In recent years, significant results related to estimation and variable selection for quantile regression have been obtained for big data applications. However, there exists little work on inferential methods including hypothesis testing and confidence interval construction for quantile regression in the high-dimensional setting. This project seeks to develop new statistical theory, methodology and algorithms to address inference problems in high-dimensional quantile regression. The research is motivated by a large clinical trial of diabetes intervention, and the developed methods can also be applied to data from genome-wide association studies, neuroscience, and environmental studies.
The research will focus on two main directions. First, new testing procedures will be developed to assess the overall significance of high-dimensional covariates on quantiles of the response distribution. Two types of tests are proposed, including a maximum-type statistic based on marginal quantile regression, and a score-type statistic. Theory and methods for both fixed and diverging dimensions will be studied. Second, focusing on high-dimensional quantile regression without the conventional minimum signal strength condition on the coefficients, the PI will rigorously study the asymptotic theory of penalized estimators and develop valid post-selection inference methods based on both asymptotic theory and bootstrap procedures. The PI will integrate research and education by developing advanced topics courses, engaging graduate and undergraduate students, especially those from under-represented groups, in the project, and reaching out to K-12 students and developing countries through collaboration and knowledge sharing.