Multi-armed Bandit Problems with Covariates

Yang, Yuhong

Abstract

Multi-armed bandit (MAB) refers to a class of sequential decision making problems where in each step one needs to choose a population from which a random reward will be generated. The goal is to maximize the total accumulated reward. The literature on MAB, with few exceptions, ignores available covariates. In this project, the PI will study MAB with covariates in general frameworks and develop methodologies as well as theories for various applications. The project will 1) provide methods for selecting key covariates; 2) establish consistency in variable selection; 3) establish consistency of the allocation rule in terms of the accumulated reward; 4) derive the rate of convergence of the accumulated reward relative to the oracle choices. In addition, nonparametric estimation of the mean reward functions and model combinations will be utilized for achieving higher expected reward. Strategies that simultaneously achieve high expected reward and also provide sufficient information for identifying the best arm (with high probability) will be sought.

In practice of medicine, treatments previously shown to be the best at population levels in clinical trials are given to new patients with minimal consideration of his/her own personal characteristics such as genetic profile. If practically feasible, there is every reason for a patient to be treated in a way that the outcomes of all previous treatments of patients with the same disease will have been taken into account and consequently the most promising individualized treatment is selected based on genetic information, clinical assessments, and all the accumulated trial/treatment results. The proposed research will set up statistical frameworks and build theories and methodologies for application of individualized medicine using the statistical machinery of sequential allocation with covariates. Besides medicine, sequential allocation has applications in operations research, industrial engineering, economics and other fields. Due to the ease of getting and processing information furnished by the exponential growth of modern technology, with new research to bring effective use of key predictors, applications of sequential allocation with covariates will make a real impact, saving lives, improving health, promoting business, and reducing operating cost for the society.

Project Report

(MABC) have important applications in various areas. The themes of the funded research are development of new optimal sequential decision rules (choosing the best arm) when covariates are available, derivation of methods for related model selection/combination problems, and building model selection diagnostic tools for better data modeling. The goals of the project are well achieved in the end. One major goal of the project is to obtain optimal allocation rules for MABC. The new allocation methods from this research overcome several major limitations of previous methods in the literature that hinder real applications. Specifically, new technical tools are built to facilitate the use of more powerful nonparametric regression techniques for estimating reward functions under much less model structural demand, and the new methods are adaptive to both smoothness of the reward functions and margin conditions. Furthermore, model selection and combination can be flexibly incorporated in the estimation/allocation process. An application on web-based personalized news article recommendation demonstrates advantages of the new nonparametric MABC methods. In addition, robust forecast combination methods that can effectively deal with heavy tailed errors are constructed, which can produce more reliable personalized prediction for making optimal individualized decision. High-dimensional regression under soft or weak sparsity has attracted theoretical interests in recent years because of their wide range of potential applications. This research has successfully built the first minimax rate estimators for high-dimensional linear regression adaptively over a whole scale of soft and hard sparsities. Capability of sparse linear approximation to high-dimensional functions plays a fundamental role for a thorough theoretical understanding on high-dimensional regression. This research has provided a sharp characterization on how many terms are needed to approximate well a function with optimal coefficients in a soft sparse ball. The bounds have various implications on high-dimensional modeling. Estimation of the conditional treatment effect is a very important topic for personalized decision making. Up to now, there is little work on model selection that can be applied to situations with possible model mis-specification and/or the use of nonparametric methods. This project has come up with a cross-validation method for treatment effect estimation that can be used to compare both parametric and nonparametric methods. For high-dimensional data, with a sparsity-encouraging penalty on the parameters, one typically selects a sparse subset of the original predictors. But its reliability is unclear. In this project, model selection confidence sets are explored, and model selection diagnostics measures are proposed and validated both theoretically and numerically. The tools provide the data analyst a proper sense on reliability of a selected model for better inference. An R package developed in this project is available for public use. The project has resulted in over 15 publications in statistics and related fields including economics, machine learning, mathematics and forecasting. The results have been followed up in those fields for a variety of researches and applications. In particular, the publication of Nan and Yang (2014) in Journal of Computational and Graphical Statistics is listed by Taylor & Francis as one of the Top 10 Most Read Articles of 2014 in Statistics. The project has also provided good training of students at doctoral, MS and BS levels in STEM.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Mathematical Sciences (DMS)
Application #: 1106576
Program Officer: Gabor Szekely

Project Start
Project End
Budget Start: 2011-06-01
Budget End: 2014-10-31
Support Year
Fiscal Year: 2011
Total Cost: $249,987
Indirect Cost

Multi-armed Bandit Problems with Covariates
Yang, Yuhong
University of Minnesota Twin Cities, Minneapolis, MN, United States

Abstract

Project Report

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Project Report

Funding Agency

Institution

Comments