An essential goal of modern statistical analyses across many disciplines is to gain insight into the behavior of real-world processes both to identify important correlates of variation and to obtain improved predictions. For example in marketing, the statistician may be interested in learning the purchasing behavior of consumers from an analysis of a database of consumer transactions that includes various consumer descriptors (e.g. age, income level, geographic location) as well as purchase amounts. The statistician would then typically attempt to build a mathematical model that characterizes the relationship between the consumer descriptors and the expenditure amount. In doing so, however, certain issues bear strongly on the model's value and effectiveness. First, the validity of a model may strongly depend on prior assumptions about the nature of the modeled process, information that can be difficult to ascertain. For instance, a consumer behavior model which builds in a simple assumption that consumers with higher income levels are always expected to purchase more, may be inadvertently ignoring subtleties that violate this assumption when other factors are simultaneously taken into account. Second, sometimes even a valid and effective model may be such a complicated object that the extraction of meaningful information can itself be very challenging. For example, after establishing particular set of predictors as important drivers of consumer purchasing power, it will still be of key interest how to best measure their relative importance in the model. Focusing on the powerful and flexible approach of Bayesian regression tree ensemble modeling, the main thrust of this project will be to innovate this methodology to address these and other modeling avenues. This new methodology will enable practitioners to address their research questions in an assumption-lean framework that allows the ensemble models to make use of their data to adaptively and flexibly incorporate contextual modeling assumptions. To greatly enhance interpretability, it will also provide automatic, information based summaries of variable importance to help the practitioner understand and interpret the available descriptor information. In addition to these and further methodological contributions, the project will develop software for the implementation of this methodology as a freely available R package, enabling practitioners to more easily leverage our developments in their practical work. This is where the graduate student supported by this award will help.

The research will focus on three general innovations to Bayesian ensemble modeling to further enhance its ability to extract meaning from complex data within an assumption lean framework. The first contribution will develop theoretically valid measures of variable importance. These measures will provide computationally efficient calculation of indices which meaningfully gauge the relative importance of predictor variables, both marginally and in terms of interactions. The second contribution will provide an approach to monotone shape constrained inference which does not require any prior assumption of monotonicity. This multidimensional nonparametric regression approach will enable the discovery and estimation of any and all the monotone components of the regression function, and to do so with no constraint assumptions whatsoever. The third contribution will vastly extend the applicability of Bayesian ensemble modeling by developing a generalization of BART for arbitrary response data distributions, such as dichotomous responses and count data. This major technical innovation will be based on a conjugacy-free formulation that will extend the reach of BART to many new application areas and problem types than were previously possible.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1916233
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2019-07-15
Budget End
2022-06-30
Support Year
Fiscal Year
2019
Total Cost
$159,997
Indirect Cost
Name
Arizona State University
Department
Type
DUNS #
City
Tempe
State
AZ
Country
United States
Zip Code
85281