The modern era of Big Data brings unique opportunities as well as challenges to the statistician. While the Big Data revolution brings a great opportunity to obtain valuable and profound insights from the richness of data and to enhance data-driven decision making, it also brings challenging demands for innovation and knowledge discovery in three crucial aspects from statisticians and data scientists: (i) development of flexible models that can appropriately describe the complexities of the data (ii) efficient and valid statistical estimation and inferential procedures, and (iii) development of computational algorithms that scale-up to large datasets. The purpose of this project is to make advances in all the three aspects by fully exploring the Bayesian framework, which treats the parameters of a model to be random and provides an efficient mechanism to quantify the uncertainty of the model parameters. In particular, the techniques developed will be useful for analyzing datasets containing a large number of covariates, for learning the dependence structures between a large number of outcome variables, and for obtaining a comprehensive description of the impact of covariates on outcome variables by modeling their relationships at different quantile levels. The research developed will have impact on statistical practice in various disciplines including biology, economics, environmental sciences, marketing, and medical sciences. The training component will integrate research into teaching by offering special topics courses to graduate students based on the proposed research and by developing undergraduate research projects that incorporate research concepts at an accessible level. The PI will mentor high school research projects and organize a K-12 outreach workshop to provide exposure to modern statistics and its applications to high school students and teachers.
Statistically rigorous and computationally efficient Bayesian methodologies and inferential procedures will be developed which will be applicable for a variety of complex high dimensional models including generalized linear models, quantile regression models, and graphical models. General classes of Bayesian regularization priors will be proposed, and their regularization properties will be rigorously studied for a variety of commonly used likelihood functions. In contrast to most of the existing Bayesian approaches that focus on high dimensional estimation, a novel Bayesian framework for performing high dimensional Bayesian inference having valid frequentist properties will be developed. Scalable computational techniques that do not involve large matrix operations for obtaining point estimators from the posteriors as well as for sampling the full posterior distributions will be devised and their statistical properties will be studied. An attractive feature of the computational developments will be that they will be applicable to a diverse range of statistical models commonly used in practice. The research developed will be closely related to several highly active areas of modern statistics including high dimensional modeling, Bayesian computation, nonconvex regularization, post-selection inference, graphical models, and quantile regression, and will contribute to the advancement of and interaction between these areas.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.