RI: Small: Reinforcement Learning by Mirror Descent

Mahadevan, Sridhar

Abstract

A fundamental challenge in machine learning is the design of computational agents that, rather than being explicitly programmed, autonomously learn complex tasks in stochastic real-world environments. Past approaches, such as reinforcement learning algorithms for solving Markov decision processes, scale poorly to large state spaces. The proposed research addresses this curse of dimensionality by investigating a novel framework combining reinforcement learning and online convex optimization, in particular mirror descent and related algorithms. Mirror descent scales significantly better than classical first-order gradient descent in high-dimensional state spaces, by using a distance-generating function specific to a particular state space geometry.

The proposed framework enables several significant algorithmic advances in the design of autonomous machine learning agents: a new class of first-order mirror-descent based methods for learning sparse solutions to Markov decision processes will be developed that scale significantly significantly better than previous second-order methods; novel hierarchical methods for solving semi-Markov decision processes will be investigated; and finally, applications to a variety of high-dimensional Markov decision processes will be explored.

The anticipated outcomes of the proposed work include foundational advances in designing autonomous agents that learn to solve sequential decision-making problems, which will impact a large number of target applications from manufacturing to robotics and scheduling. The educational goal includes the development of a graduate-level course in online convex optimization for sequential decision-making, as well as interdisciplinary tutorials to enhance the cross-fertilization of ideas from applied mathematics and optimization to machine learning and artificial intelligence.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1216467
Program Officer: Weng-keen Wong

Project Start
Project End
Budget Start: 2012-08-01
Budget End: 2016-07-31
Support Year
Fiscal Year: 2012
Total Cost: $449,991
Indirect Cost

RI: Small: Reinforcement Learning by Mirror Descent
Mahadevan, Sridhar
University of Massachusetts Amherst, Hadley, MA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments