Bandit problems have been studied in many different contexts and variations. The vast majority of work focuses on a static environment, in which, at each time, the probability distribution of the reward yielded by each action remains unchanged. This static model may clearly fail to produce decision strategies that are optimal in a dynamically changing environment. Despite their sounding relevance in practical applications, such environments have received sporadic attention in the statistical community so far. The contribution of the proposed research to the current state of knowledge will consist in proposing models for new dynamic environments that are motivated by a significant class of applications, designing policies that adapt to dynamic environments, analyzing the performance of these policies and assessing optimality from a finite time (non asymptotic) point of view.

Sequential allocation in dynamic environments is a problem that arises at the intersection of nonparametric statistics, machine learning and operations research. This project involves techniques from these fields and points out fundamental bridges between the extant results to form a more unified theory of the subject. This theory will then serve as a basis for producing computationally efficient allocation policies with potential applications in clinical trials, drug discovery and real time web page content optimization.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0906424
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2009-07-15
Budget End
2013-06-30
Support Year
Fiscal Year
2009
Total Cost
$110,000
Indirect Cost
Name
Princeton University
Department
Type
DUNS #
City
Princeton
State
NJ
Country
United States
Zip Code
08540