The objectives of this project are to develop efficient and reliable algorithms for direct reinforcement, to learn risk-averse behaviors for problems with high degrees of uncertainty, and to apply the methods developed to an economically important problem: global asset allocation. Reinforcement learning (RL) enables a goal-directed agent to discover strategies through trial and error exploration with only limited feedback. Direct Reinforcement (DR, or "policy gradient") methods enable an agent to discover a strategy without the need to learn a value function.

Dynamic programming and related value function RL methods are often found to be inefficient, to produce unstable solutions, and to have difficulty scaling up to large problems. Hence, there have been relatively few real-world applications of the value function type RL. This project seeks to make several advancements in Direct Reinforcement that will enable the development of efficient and effective practical applications.

By controlling the "exploration vs. exploitation" trade-off during on-line learning, DR agents will be able to discover better policies and do so more efficiently. Stochastic optimization methods, such as stochastic "search then converge" or annealing of a Boltzmann temperature are candidate approaches. By developing risk-averse reinforcement methods, DR agents will be able to learn robust policies for uncertain or risky environments. Using risk-sensitive intertemporal utilities, DR agents will learn to avoid risky states or actions while they pursue long-term reward. Dynamic programming is widely used in economics and finance, but few attempts have been made to solve important financial problems with reinforcement learning. As a demonstration of risk-averse DR, this project will build a prototype global asset allocation system.

Risk-averse direct reinforcement may find application in a variety of engineering domains, from robotics to industrial control to autonomous agents. Many industries, such as energy and the airlines, need to manage operational and financial risks together, in order to avoid supply shortfalls or bankruptcy. Individual investors must manage risk while building their investment portfolios to meet future needs, such as children's college expenses or retirement. Risk-averse Direct Reinforcement may find application in many such contexts.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0342634
Program Officer
Douglas H. Fisher
Project Start
Project End
Budget Start
2003-08-01
Budget End
2008-07-31
Support Year
Fiscal Year
2003
Total Cost
$339,998
Indirect Cost
Name
International Computer Science Institute
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704