Statistical model building is an important part of scientific discovery. In the big data era, high dimensional data arise frequently. Model selection in the presence of high dimensional features in the framework of linear models, generalized linear models, and models with censored data has been a very active area of research in recent years. The PI aims to develop new algorithms for model selection, within a Bayesian computational framework, that are scalable for high dimensional problems. The PI motivates the proposed research through collaborations with scientists in atmospheric sciences, genetics, and kinesiology, and aims to develop methodologies that are broadly applicable in statistical modeling and data analysis.

Much of the recent work has focused on shrinkage through penalization or regularization. Bayesian computational methods, when interpreted broadly, play a valuable role in statistics, including model selection and estimation, but face important hurdles in high dimensional statistics, both in theoretical intricacy and in computational scalability. The PI aims to develop a theoretical framework to demonstrate model selection consistency from the frequentist perspective, which offers interesting insights into why Bayesian model selection methods can provide an asymptotic approximation to the L0 penalty. An important part of the proposed work is the development of a modified Gibbs sampler in the selection of sparse models that is much more scalable than standard MCMC algorithms in the presence of high dimensional variables. The Bayesian methods are especially useful in problems with non-convex objective functions, where Bayesian computation methods can be more robust in performance than direct optimization. A primary application of such a problem considered in the project is quantile regression for censored data. In addition to model selection, the PI proposes a new estimation method for censored quantile regression that promises to be computationally and statistically efficient. Equally importantly, the new method adapts easily to general forms of censoring that other estimation methods have found difficult to handle. The PI will continue integrating research with education by working with PhD students and by providing research experiences for undergraduate students. The research output will be properly disseminated through conferences and workshops and through publication in widely read journals in statistical science.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1607840
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2016-09-01
Budget End
2020-08-31
Support Year
Fiscal Year
2016
Total Cost
$300,000
Indirect Cost
Name
Regents of the University of Michigan - Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109