The research objective of this project is the study of problems in adaptive resource allocation, where the number of available actions is very large or infinite. In these problems, the decision maker chooses an action at each time step and observes some rewards, but the distribution of these rewards is initially unknown. Information is acquired during the decision making process. This introduces a tradeoff between choosing actions that appear most profitable given the available information, and choosing actions that result in additional information on unexplored possibilities. The research will formulate and study a new model, where the rewards of different actions are not independent of each other, but are determined by a small number of underlying random variables. By exploiting the correlation among the rewards, this research will develop policies that quickly identify the optimal action. The models and polices developed in this project will be applied to problems such as web advertising, price optimization, and supply chain management.

If successful, the project will result in novel methodologies with immediate usefulness in many applications. The research will also establish the best possible regret and risk that can be attained in such models (as a function of time and of the number of underlying variables), by developing lower bounds on regret and policies that achieve (or nearly achieve) the lower bounds. In addition, the project will result in novel frameworks involving non-stationary environments, and establish a connection with classical problems in adaptive stochastic control. This work will provide a new perspective on some important classes of adaptive decision making problems, including some novel formulations and directions in this field.

Project Start
Project End
Budget Start
2011-08-15
Budget End
2012-06-30
Support Year
Fiscal Year
2011
Total Cost
$36,699
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089