Bandit problems have been studied in many different contexts and variations. The vast majority of work focuses on a static environment, in which, at each time, the probability distribution of the reward yielded by each action remains unchanged. This static model may clearly fail to produce decision strategies that are optimal in a dynamically changing environment. Despite their sounding relevance in practical applications, such environments have received sporadic attention in the statistical community so far. The contribution of the proposed research to the current state of knowledge will consist in proposing models for new dynamic environments that are motivated by a significant class of applications, designing policies that adapt to dynamic environments, analyzing the performance of these policies and assessing optimality from a finite time (non asymptotic) point of view.
Sequential allocation in dynamic environments is a problem that arises at the intersection of nonparametric statistics, machine learning and operations research. This project involves techniques from these fields and points out fundamental bridges between the extant results to form a more unified theory of the subject. This theory will then serve as a basis for producing computationally efficient allocation policies with potential applications in clinical trials, drug discovery and real time web page content optimization.