In many areas of empirical research, regression models often are used to make predictions in order to answer questions that aim to advance research and society. Such advancements hinge critically on accurate predictions. Commonly used regression models, however, can yield inaccurate predictions because they make questionable assumptions that often are violated by data. Questionable assumptions include that the dependent variable has linear relationships with the predictor variables, that the variance of the dependent variable does not change with the predictor variables, and that the dependent variable and random predictor effects are normally distributed. This research will investigate and develop a Bayesian nonparametric (BNP) regression model that allows the entire distribution (density) of the dependent variable to change flexibly and nonlinearly with the predictor variables. The BNP model will be defined by a predictor-dependent infinite mixture of unimodal distributions, with each unimodal distribution modeled by an infinite mixture of uniform distributions. A prior distribution on all parameters will complete the specification of the BNP model. The BNP regression model and a multi-level version of the model will be applied to analyze at least three large data sets: (1) to evaluate the effect of a new teacher education curriculum on the basic skills of its students; (2) to study the conditions under which urban school students find texts meaningful to read; and (3) to study the predictors of heritability of antisocial behavior in a meta-regression analysis. Furthermore, the performance of the BNP regression model will be evaluated through the analysis of many simulated data sets for a wide range of data-generating conditions. For all the real and simulated data sets, it is expected that the BNP model will show better predictive accuracy and will provide more scientific insights when compared with other regression models in common use.
The research project will advance statistical science through the development and thorough investigation of a novel BNP regression model. This model is expected to outperform other regression models that currently are used for making predictions. Through the analysis of the three large data sets, the research will advance knowledge and scientific understanding of important societal issues, including best practices for K-12 math, science, and literacy education and the treatment of youth with emotional and behavioral disabilities. Finally, for a broad audience of researchers, this project will provide user-friendly software for performing data analysis with the BNP regression model. This will help further promote more accurate answers to research questions that are important to society.